Bank Churn Prediction

Problem Statement¶

Context¶

Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

Objective¶

You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.

Data Dictionary¶

  • CustomerId: Unique ID which is assigned to each customer

  • Surname: Last name of the customer

  • CreditScore: It defines the credit history of the customer.

  • Geography: A customer’s location

  • Gender: It defines the Gender of the customer

  • Age: Age of the customer

  • Tenure: Number of years for which the customer has been with the bank

  • NumOfProducts: refers to the number of products that a customer has purchased through the bank.

  • Balance: Account balance

  • HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.

  • EstimatedSalary: Estimated salary

  • IsActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )

  • Exited : whether or not the customer left the bank within six month. It can take two values

    • 0 = No ( Customer did not leave the bank )
    • 1 = Yes ( Customer left the bank )
In [ ]:
 

Please read the instructions carefully before starting the project.¶

This is a commented Jupyter IPython Notebook file in which all the instructions and tasks to be performed are mentioned.

  • Blanks '___' are provided in the notebook that needs to be filled with an appropriate code to get the correct result. With every '___' blank, there is a comment that briefly describes what needs to be filled in the blank space.
  • Identify the task to be performed correctly, and only then proceed to write the required code.
  • Fill the code wherever asked by the commented lines like "# write your code here" or "# complete the code". Running incomplete code may throw error.
  • Please run the codes in a sequential manner from the beginning to avoid any unnecessary errors.
  • Add the results/observations (wherever mentioned) derived from the analysis in the presentation and submit the same.

Importing necessary libraries¶

In [ ]:
# Installing the libraries with the specified version.
!pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==2.0.3 imbalanced-learn==0.10.1 -q --user

Note: After running the above cell, please restart the notebook kernel/runtime (depending on whether you're using Jupyter Notebook or Google Colab) and then sequentially run all cells from the one below.

In [ ]:
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np

# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Library to split data
from sklearn.model_selection import train_test_split

# library to import to standardize the data
from sklearn.preprocessing import StandardScaler, LabelEncoder

# importing different functions to build models
import tensorflow as tf
from tensorflow import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense, Dropout

# importing SMOTE
from imblearn.over_sampling import SMOTE

# importing metrics
from sklearn.metrics import confusion_matrix,roc_curve,classification_report,recall_score

import random

# Library to avoid the warnings
import warnings
warnings.filterwarnings("ignore")

Loading the dataset¶

In [ ]:
# Uncomment and run the following lines in case Colab is being used
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
ds = pd.read_csv('/content/drive/MyDrive/Colab Datafiles/Churn.csv')# complete the code to load the dataset
ds1 = ds.copy()

Data Overview¶

View the first and last 5 rows of the dataset.¶

In [ ]:
# let's view the first 5 rows of the data
ds.head() ##  Complete the code to view top 5 rows of the data
Out[ ]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
In [ ]:
# let's view the last 5 rows of the data
ds.tail() ##  Complete the code to view last 5 rows of the data
Out[ ]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
9995 9996 15606229 Obijiaku 771 France Male 39 5 0.00 2 1 0 96270.64 0
9996 9997 15569892 Johnstone 516 France Male 35 10 57369.61 1 1 1 101699.77 0
9997 9998 15584532 Liu 709 France Female 36 7 0.00 1 0 1 42085.58 1
9998 9999 15682355 Sabbatini 772 Germany Male 42 3 75075.31 2 1 0 92888.52 1
9999 10000 15628319 Walker 792 France Female 28 4 130142.79 1 1 0 38190.78 0

Understand the shape of the dataset¶

In [ ]:
# Checking the number of rows and columns in the training data
ds.shape##  Complete the code to view dimensions of the train data
Out[ ]:
(10000, 14)

Check the data types of the columns for the dataset¶

In [ ]:
ds.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB

Checking the Statistical Summary¶

In [ ]:
ds.describe().T
Out[ ]:
count mean std min 25% 50% 75% max
RowNumber 10000.0 5.000500e+03 2886.895680 1.00 2500.75 5.000500e+03 7.500250e+03 10000.00
CustomerId 10000.0 1.569094e+07 71936.186123 15565701.00 15628528.25 1.569074e+07 1.575323e+07 15815690.00
CreditScore 10000.0 6.505288e+02 96.653299 350.00 584.00 6.520000e+02 7.180000e+02 850.00
Age 10000.0 3.892180e+01 10.487806 18.00 32.00 3.700000e+01 4.400000e+01 92.00
Tenure 10000.0 5.012800e+00 2.892174 0.00 3.00 5.000000e+00 7.000000e+00 10.00
Balance 10000.0 7.648589e+04 62397.405202 0.00 0.00 9.719854e+04 1.276442e+05 250898.09
NumOfProducts 10000.0 1.530200e+00 0.581654 1.00 1.00 1.000000e+00 2.000000e+00 4.00
HasCrCard 10000.0 7.055000e-01 0.455840 0.00 0.00 1.000000e+00 1.000000e+00 1.00
IsActiveMember 10000.0 5.151000e-01 0.499797 0.00 0.00 1.000000e+00 1.000000e+00 1.00
EstimatedSalary 10000.0 1.000902e+05 57510.492818 11.58 51002.11 1.001939e+05 1.493882e+05 199992.48
Exited 10000.0 2.037000e-01 0.402769 0.00 0.00 0.000000e+00 0.000000e+00 1.00

Checking for Missing Values¶

In [ ]:
# let's check for missing values in the data
ds.isnull() ##  Complete the code to check missing entries in the train data
Out[ ]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 False False False False False False False False False False False False False False
1 False False False False False False False False False False False False False False
2 False False False False False False False False False False False False False False
3 False False False False False False False False False False False False False False
4 False False False False False False False False False False False False False False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 False False False False False False False False False False False False False False
9996 False False False False False False False False False False False False False False
9997 False False False False False False False False False False False False False False
9998 False False False False False False False False False False False False False False
9999 False False False False False False False False False False False False False False

10000 rows × 14 columns

In [ ]:
ds.isnull().sum()
Out[ ]:
RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64
In [ ]:
# Check for missing values
missing_values = ds.isnull().sum()

# Check for duplicate rows
duplicate_rows = ds.duplicated().sum()

# Check for unknown values (e.g., placeholders like '?', 'NA', 'unknown')
unknown_values = (ds == '?').sum() + (ds == 'NA').sum() + (ds == 'unknown').sum()

# Display the results
print("Missing values in each column:\n", missing_values)
print("\nNumber of duplicate rows: ", duplicate_rows)
print("\nUnknown values in each column:\n", unknown_values)
Missing values in each column:
 RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

Number of duplicate rows:  0

Unknown values in each column:
 RowNumber          0
CustomerId         0
Surname            0
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64

Checking for unique values for each of the column¶

In [ ]:
ds.nunique()
Out[ ]:
RowNumber          10000
CustomerId         10000
Surname             2932
CreditScore          460
Geography              3
Gender                 2
Age                   70
Tenure                11
Balance             6382
NumOfProducts          4
HasCrCard              2
IsActiveMember         2
EstimatedSalary     9999
Exited                 2
dtype: int64
In [ ]:
#RowNumber , CustomerId and Surname are unique hence dropping it
ds = ds.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

Exploratory Data Analysis¶

Univariate Analysis¶

In [ ]:
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate summary statistics for numerical variables
numerical_vars = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']
summary_stats = ds[numerical_vars].describe()

summary_stats
Out[ ]:
CreditScore Age Tenure Balance NumOfProducts EstimatedSalary
count 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000
mean 650.528800 38.921800 5.012800 76485.889288 1.530200 100090.239881
std 96.653299 10.487806 2.892174 62397.405202 0.581654 57510.492818
min 350.000000 18.000000 0.000000 0.000000 1.000000 11.580000
25% 584.000000 32.000000 3.000000 0.000000 1.000000 51002.110000
50% 652.000000 37.000000 5.000000 97198.540000 1.000000 100193.915000
75% 718.000000 44.000000 7.000000 127644.240000 2.000000 149388.247500
max 850.000000 92.000000 10.000000 250898.090000 4.000000 199992.480000

Observations

Feature Scaling:

Since features like CreditScore, Age, Balance, and EstimatedSalary have different ranges and units, it is essential to standardize or normalize these features to ensure the model performs well.

Handling Zero Balances:

The presence of a significant number of customers with a balance of 0 should be explored further. These could be a separate class or might need special handling during model training.

Tenure and Product Usage:

Tenure and NumOfProducts indicate customer loyalty and engagement. These features can be crucial for predicting churn and should be carefully analyzed for their impact on the model.

Age Diversity:

The wide age range suggests that different age groups may have different behaviors. Age might interact with other features like Balance and NumOfProducts in interesting ways.

Outliers:

High values in Balance and EstimatedSalary may act as outliers. It's important to check if these outliers disproportionately affect the model and consider techniques like log transformation if needed.

Next Steps

Data Preprocessing:

Normalize/standardize the numerical features. Encode categorical variables like Geography and Gender.

Feature Engineering:

Investigate zero balances and consider creating new features that capture the interaction between Balance and NumOfProducts.

Model Training:

Use the preprocessed data to train a neural network model, ensuring that features are scaled and properly encoded.

In [ ]:
# Histograms
ds[numerical_vars].hist(bins=30, figsize=(15, 10))
plt.suptitle('Histograms of Numerical Variables')
plt.show()

# Box plots
plt.figure(figsize=(15, 10))
for i, var in enumerate(numerical_vars):
    plt.subplot(2, 3, i + 1)
    sns.boxplot(y=ds[var])
    plt.title(f'Box plot of {var}')
plt.tight_layout()
plt.show()

Observations and Key Takeaways from the Histograms

1. CreditScore: • Distribution: The credit scores are approximately normally distributed with a peak around 650-700. • Skewness: There is a slight left skew with more customers having higher credit scores closer to the maximum value of 850. • Implication: Credit score is a crucial variable and its normal distribution suggests that the dataset is balanced in terms of creditworthiness. Standardizing this variable will help in modeling.

2. Age: • Distribution: Age shows a right-skewed distribution with a significant number of customers in the 30-40 age range. • Implication: The concentration of customers in a specific age group (30-40) indicates that age could be a significant factor in customer behavior and churn prediction. It might be beneficial to create age groups or bins to better capture trends.

3. Tenure: • Distribution: Tenure appears to be uniformly distributed with a slight decrease at the higher end (10 years). • Implication: Since tenure is spread across all values with no specific trend, it might be used as is or binned to understand its impact on churn better.

4. Balance: • Distribution: Balance has a large number of customers with a zero balance, while the rest of the distribution shows a normal-like spread with a peak around 100,000. • Implication: The zero-balance customers might need special consideration as they could represent inactive or low-value customers. This feature will likely be crucial for identifying churn patterns.

5. NumOfProducts: • Distribution: Most customers have either 1 or 2 products, with very few having 3 or 4 products. • Implication: The small number of customers with 3 or more products indicates that those with more products might have different behavior patterns. This can be a significant predictor of churn, as customers with fewer products might be more likely to leave.

  1. EstimatedSalary: • Distribution: Estimated salary is evenly distributed across the range. • Implication: The even distribution of estimated salary suggests that income level may not directly correlate with churn, but it should still be included as it might interact with other variables (e.g., balance or number of products).

Key Takeaways for Model Building

  1. Feature Engineering: • CreditScore and Age: Standardize these features to normalize their ranges. • Age and Tenure: Consider creating bins to capture the effect of different age groups and tenure periods. • Balance: Handle zero-balance customers separately, as they might represent a different customer segment. • NumOfProducts: Treat customers with 3 or more products as a distinct group for analysis.
  2. Model Sensitivity: • The distribution insights can guide how to handle feature scaling and transformation. • Ensure that the model can capture non-linear relationships and interactions between features (e.g., using interaction terms or higher-degree polynomial features if necessary).
  3. Data Preprocessing: • Standardize numerical features to ensure they contribute equally to the model. • Encode categorical variables (e.g., Geography and Gender) appropriately.
  4. Evaluation: • Check for overfitting, especially considering the zero-balance customers. • Use stratified sampling to ensure that the training and test sets have similar distributions.
In [ ]:
# Frequency counts for categorical variables
categorical_vars = ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember', 'Exited']
frequency_counts = {var: ds[var].value_counts() for var in categorical_vars}

frequency_counts
Out[ ]:
{'Geography': Geography
 France     5014
 Germany    2509
 Spain      2477
 Name: count, dtype: int64,
 'Gender': Gender
 Male      5457
 Female    4543
 Name: count, dtype: int64,
 'HasCrCard': HasCrCard
 1    7055
 0    2945
 Name: count, dtype: int64,
 'IsActiveMember': IsActiveMember
 1    5151
 0    4849
 Name: count, dtype: int64,
 'Exited': Exited
 0    7963
 1    2037
 Name: count, dtype: int64}
In [ ]:
# Bar plots
plt.figure(figsize=(15, 10))
for i, var in enumerate(categorical_vars):
    plt.subplot(3, 2, i + 1)
    sns.countplot(x=ds[var])
    plt.title(f'Bar plot of {var}')
plt.tight_layout()
plt.show()

Key Takeaways from the Bar Plots¶

1. Geography: • Distribution: • France has the highest number of customers, followed by Spain and Germany. • Implication: • This geographic distribution should be taken into account, as customers from different regions may have different behaviors and churn rates. It might be useful to include interactions between Geography and other features in the model.

2. Gender: • Distribution: • The number of male and female customers is almost equal. • Implication: • Gender balance suggests that any gender-based differences in churn can be observed and modeled. It is important to include gender as a feature to see if it influences churn.

3. HasCrCard: • Distribution: • A significant majority of customers have a credit card. • Implication: • The possession of a credit card might be a factor in customer engagement and satisfaction. Analyzing the impact of having a credit card on churn can provide valuable insights.

4. IsActiveMember: • Distribution: • The distribution is nearly even between active and non-active members. • Implication: • Whether a customer is an active member could be a crucial factor in predicting churn. Active members might be less likely to churn compared to inactive ones.

5. Exited: • Distribution: • There is a clear imbalance, with many more customers not exiting (churning) compared to those who did exit. • Implication: • The imbalance in the target variable (Exited) suggests that class imbalance techniques (e.g., SMOTE, undersampling) might be necessary to improve model performance. It’s important to address this imbalance to ensure the model accurately predicts both classes.

Summary for Model Building

1. Geographical Insights: • Incorporate the geographic distribution into feature engineering. Consider region-specific models or adding interaction terms between Geography and other features.

2. Gender Analysis: • Include gender as a feature and explore its interaction with other variables to see if there are significant patterns related to churn.

3. Credit Card Ownership: • Analyze how having a credit card impacts churn. This could be an important feature, possibly indicating higher engagement or dependency on bank services.

4. Customer Activity: • The status of being an active member should be prominently featured in the model. It is a strong indicator of customer engagement and likely retention.

5. Class Imbalance: • Apply techniques to address class imbalance in the target variable to ensure that the model does not become biased towards the majority class (non-churn).

Long given Univariate Analysis¶

In [ ]:
# function to plot a boxplot and a histogram along the same scale.


def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (12,7))
    kde: whether to show the density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)},
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a star will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram
In [ ]:
 
In [ ]:
# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot

Observations on CreditScore¶

In [ ]:
histogram_boxplot(ds,'CreditScore')

Observations on Age¶

In [ ]:
histogram_boxplot(ds,'Age')          ## Complete the code to create histogram_boxplot for Age

Observations on Balance¶

In [ ]:
histogram_boxplot(ds,'Balance')          ## Complete the code to create histogram_boxplot for Balance

Observations on Estimated Salary¶

In [ ]:
histogram_boxplot(ds,'EstimatedSalary')          ## Complete the code to create histogram_boxplot for Estimated Salary

Observations on Exited¶

In [ ]:
labeled_barplot(ds, "Exited", perc=True)

Observations on Geography¶

In [ ]:
labeled_barplot(ds, "Geography", perc=True)               ## Complete the code to create labeled_barplot for Geography

Observations on Gender¶

In [ ]:
labeled_barplot(ds, "Gender", perc=True)               ## Complete the code to create labeled_barplot for Gender

Observations on Tenure¶

In [ ]:
labeled_barplot(ds, "Tenure", perc=True)               ## Complete the code to create labeled_barplot for Tenure

Observations on Number of Products¶

In [ ]:
labeled_barplot(ds, "NumOfProducts", perc=True)               ## Complete the code to create labeled_barplot for Number of products

Observations on Has Credit Card¶

In [ ]:
labeled_barplot(ds, "HasCrCard", perc=True)               ## Complete the code to create labeled_barplot for Has credit card

Observations on Is Active Member¶

In [ ]:
labeled_barplot(ds, "IsActiveMember", perc=True)               ## Complete the code to create labeled_barplot for Is active member

Bivariate Analysis¶

In [ ]:
# function to plot stacked bar chart


def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 1, 5))
    plt.legend(
        loc="lower left",
        frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()

Correlation plot¶

In [ ]:
# defining the list of numerical columns
cols_list = ["CreditScore","Age","Tenure","Balance","EstimatedSalary"]
In [ ]:
plt.figure(figsize=(15, 7))
sns.heatmap(ds[cols_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()

Key Takeaways from the Correlation Matrix

Correlation Values The correlation matrix provides the correlation coefficients between pairs of features in the dataset. These coefficients range from -1 to 1, indicating the strength and direction of the linear relationship between the variables. Here are the key observations:

  1. CreditScore: • Weak positive correlation with Balance (0.01) and Tenure (0.00). • Virtually no correlation with Age (-0.00) and EstimatedSalary (-0.00).
  2. Age: • Weak positive correlation with Balance (0.03). • Weak negative correlation with Tenure (-0.01). • Virtually no correlation with CreditScore (-0.00) and EstimatedSalary (-0.01).
  3. Tenure: • Weak positive correlation with EstimatedSalary (0.01) and CreditScore (0.00). • Weak negative correlation with Age (-0.01). • Virtually no correlation with Balance (-0.01).
  4. Balance: • Weak positive correlation with Age (0.03). • Weak positive correlation with EstimatedSalary (0.01) and CreditScore (0.01). • Weak negative correlation with Tenure (-0.01).
  5. EstimatedSalary: • Weak positive correlation with Balance (0.01), CreditScore (-0.00), Tenure (0.01). • Virtually no correlation with Age (-0.01). Key Insights
  6. Weak Correlations: • The correlation coefficients between all pairs of features are very close to zero, indicating very weak or no linear relationships between these features.
  7. Feature Independence: • The weak correlations suggest that these features are relatively independent of each other. This independence can be beneficial for machine learning models, as it reduces multicollinearity, allowing each feature to contribute uniquely to the model.
  8. Implications for Model Building: • Given the weak correlations, all these features can be considered for inclusion in the model as they provide distinct information about the customers. • It’s important to use techniques like feature importance analysis after model training to determine the actual impact of each feature on the prediction.
  9. Additional Considerations: • Since the correlations are weak, it might be useful to explore non-linear relationships or interactions between features that might not be captured by simple linear correlation. Summary The correlation matrix indicates that the features in the dataset have very weak linear relationships with each other, suggesting that they are largely independent. This can be advantageous for building a predictive model, as it ensures that each feature adds unique information to the model. However, it is essential to explore non-linear relationships and interactions during feature engineering and model building to capture more complex patterns in the data.
In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

# Define the columns to analyze against 'Exited'
columns_to_analyze = ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary', 'Geography', 'Gender', 'HasCrCard', 'IsActiveMember']

# Plot bivariate analysis for numerical features
for col in ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']:
    plt.figure(figsize=(10, 5))
    sns.boxplot(x='Exited', y=col, data=ds)
    plt.title(f'Bivariate Analysis of {col} vs Exited')
    plt.show()

# Plot bivariate analysis for categorical features
for col in ['Geography', 'Gender', 'HasCrCard', 'IsActiveMember']:
    plt.figure(figsize=(10, 5))
    sns.countplot(x=col, hue='Exited', data=ds)
    plt.title(f'Bivariate Analysis of {col} vs Exited')
    plt.show()
In [ ]:
 

Observations from Bivariate Analysis

Age vs Exited

• Distribution:

• Customers who exited (churned) tend to be older compared to those who did not exit. • Churn Behavior:

• Older customers have a higher propensity to leave the bank.

Key Takeaways

• Age Factor:

• Age is a significant factor in customer churn. Older customers might have different needs or face different issues that lead to higher churn rates.

• Churn Risk:

• Implement strategies to address the specific needs of older customers to reduce their churn rate.

Next Steps

• Feature Importance:

• Include Age as an important feature in the churn prediction model.

• Targeted Strategies:

• Design retention strategies focused on addressing the needs of older customers.

• Further Analysis:

• Investigate interactions between Age and other features to understand specific segments at higher risk.


Tenure vs Exited • Distribution: • The tenure distribution is similar for both exited and non-exited customers, with a slight tendency for customers with higher tenure to churn. • Churn Behavior: • Tenure alone might not be a strong predictor of churn. Key Takeaways • Tenure Factor: • Tenure should be considered in conjunction with other factors for a better understanding of churn. • Churn Risk: • Customers with longer tenure might have accumulated grievances that lead to churn. Next Steps • Feature Importance: • Include Tenure in the model to explore its interaction with other variables. • Targeted Strategies: • Implement strategies to address long-term customer grievances. • Further Analysis: • Analyze the relationship between Tenure and customer satisfaction.


Balance vs Exited • Distribution: • Customers who exited have a slightly higher account balance compared to those who did not exit. • Churn Behavior: • High-balance customers who exit might indicate dissatisfaction despite having significant funds with the bank. Key Takeaways • Balance Factor: • Balance is an important feature to consider. High-balance customers should be closely monitored for churn risk. • Churn Risk: • High-balance customers may require special attention to understand and address their reasons for leaving. Next Steps • Feature Importance: • Include Balance as a key feature in the churn prediction model. • Targeted Strategies: • Design retention efforts focused on high-balance customers. • Further Analysis: • Investigate reasons for churn among high-balance customers.


NumOfProducts vs Exited • Distribution: • The number of products held by customers is similar for both exited and non-exited customers, with a slight increase in the number of products for customers who did not exit. • Churn Behavior: • The number of products might not show a strong direct correlation with churn. Key Takeaways • Product Engagement: • The number of products could still be relevant in combination with other features. • Churn Risk: • Customers with fewer products might be more likely to churn. Next Steps • Feature Importance: • Include NumOfProducts in the model to explore its combined effect with other features. • Targeted Strategies: • Encourage customers to adopt more products to increase engagement. • Further Analysis: • Analyze product combinations and their impact on churn.


EstimatedSalary vs Exited • Distribution: • Estimated salary shows no significant difference between exited and non-exited customers. • Churn Behavior: • Salary might not be a strong predictor of churn. Key Takeaways • Salary Factor: • Estimated salary may not be a significant standalone predictor. • Churn Risk: • Consider salary in combination with other features. Next Steps • Feature Importance: • Include EstimatedSalary in the model to explore potential interactions. • Targeted Strategies: • Focus on other more significant factors for retention strategies. • Further Analysis: • Investigate potential combined effects with other features.


Geography vs Exited • Distribution: • Churn rates vary significantly by geography, with higher churn rates observed in Germany compared to France and Spain. • Churn Behavior: • Geography is a critical factor, with regional differences impacting churn rates. Key Takeaways • Geographic Factor: • Geography significantly influences churn rates. • Churn Risk: • Tailor retention strategies to address region-specific issues. Next Steps • Feature Importance: • Include Geography as a key feature in the churn prediction model. • Targeted Strategies: • Design region-specific retention strategies. • Further Analysis: • Explore regional trends and their impact on customer behavior.


Gender vs Exited • Distribution: • Females have a higher churn rate compared to males. • Churn Behavior: • Gender differences suggest varying needs and issues leading to churn. Key Takeaways • Gender Factor: • Gender is an important predictor of churn. • Churn Risk: • Implement gender-specific retention strategies. Next Steps • Feature Importance: • Include Gender as a significant feature in the churn prediction model. • Targeted Strategies: • Address specific needs and issues of female customers. • Further Analysis: • Investigate further into gender-specific behavior patterns.


HasCrCard vs Exited • Distribution: • Customers without a credit card have a higher proportion of churn compared to those with a credit card. • Churn Behavior: • Credit card ownership is associated with lower churn rates. Key Takeaways • Credit Card Ownership: • Having a credit card appears to be associated with a lower likelihood of churn. • Churn Risk: • Customers without credit cards might be at higher risk of churn. Next Steps • Feature Importance: • Include HasCrCard as an important feature in the churn prediction model. • Targeted Strategies: • Design targeted retention strategies for customers without credit cards. • Further Analysis: • Investigate other features in combination with HasCrCard to understand specific segments at higher risk of churn. By considering these key insights, you can build a more accurate and effective model for predicting customer churn and design targeted interventions to reduce churn rates. If you need further assistance or more detailed analysis, feel free to ask!

Long Given Bivariate Analysis¶

Exited Vs Geography¶

In [ ]:
stacked_barplot(ds, "Geography", "Exited" )
Exited        0     1    All
Geography                   
All        7963  2037  10000
Germany    1695   814   2509
France     4204   810   5014
Spain      2064   413   2477
------------------------------------------------------------------------------------------------------------------------

Exited Vs Gender¶

In [ ]:
stacked_barplot(ds, "Gender", "Exited")                   ## Complete the code to plot stacked barplot for Exited and Gender
Exited     0     1    All
Gender                   
All     7963  2037  10000
Female  3404  1139   4543
Male    4559   898   5457
------------------------------------------------------------------------------------------------------------------------

Exited Vs Has Credit Card¶

In [ ]:
stacked_barplot(ds, "HasCrCard", "Exited")                   ## Complete the code to plot stacked barplot for Exited and Has credit card
Exited        0     1    All
HasCrCard                   
All        7963  2037  10000
1          5631  1424   7055
0          2332   613   2945
------------------------------------------------------------------------------------------------------------------------

Exited Vs Is active member¶

In [ ]:
stacked_barplot(ds, "IsActiveMember", "Exited")                   ## Complete the code to plot stacked barplot for Exited and Is active member
Exited             0     1    All
IsActiveMember                   
All             7963  2037  10000
0               3547  1302   4849
1               4416   735   5151
------------------------------------------------------------------------------------------------------------------------

Exited Vs Credit Score¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='CreditScore',x='Exited',data=ds)
plt.show()

Exited Vs Age¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='Age',x='Exited',data=ds)               ## Complete the code to plot the boxplot for Exited and Age
plt.show()

Exited Vs Tenure¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='Tenure',x='Exited',data=ds)               ## Complete the code to plot the boxplot for Exited and Tenure
plt.show()

Exited Vs Balance¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='Balance',x='Exited',data=ds)               ## Complete the code to plot the boxplot for Exited and Balance
plt.show()

Exited Vs Number of Products¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='NumOfProducts',x='Exited',data=ds)               ## Complete the code to plot the boxplot for Exited and Number of products
plt.show()

Exited Vs Estimated Salary¶

In [ ]:
plt.figure(figsize=(5,5))
sns.boxplot(y='EstimatedSalary',x='Exited',data=ds)               ## Complete the code to plot the boxplot for Exited and Estimated Salary
plt.show()

Data Preprocessing¶

Dummy Variable Creation¶

In [ ]:
ds = pd.get_dummies(ds,columns=ds.select_dtypes(include=["object"]).columns.tolist(),drop_first=True,dtype=float)

Train-validation-test Split¶

In [ ]:
X = ds.drop(['Exited'],axis=1) # Credit Score through Estimated Salary
y = ds['Exited'] # Exited
In [ ]:
# Splitting the dataset into the Training and Testing set.

X_large, X_test, y_large, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42,stratify=y,shuffle = True) ## Complete the code to Split the X and y and obtain test set
In [ ]:
# Splitting the dataset into the Training and Testing set.

X_train, X_val, y_train, y_val = train_test_split(X_large, y_large, test_size = 0.2, random_state = 42,stratify=y_large, shuffle = True) ## complete the code to Split X_large and y_large to obtain train and validation sets
In [ ]:
print(X_train.shape, X_val.shape, X_test.shape)
(6400, 11) (1600, 11) (2000, 11)
In [ ]:
print(y_train.shape, y_val.shape, y_test.shape)
(6400,) (1600,) (2000,)

Data Normalization¶

Since all the numerical values are on a different scale, so we will be scaling all the numerical values to bring them to the same scale.

In [ ]:
# creating an instance of the standard scaler
sc = StandardScaler()

X_train[cols_list] = sc.fit_transform(X_train[cols_list])
X_val[cols_list] = sc.transform(X_val[cols_list])    ## Complete the code to specify the columns to normalize
X_test[cols_list] = sc.transform(X_test[cols_list])    ## Complete the code to specify the columns to normalize

Model Building¶

Model Evaluation Criterion¶

Write down the logic for choosing the metric that would be the best metric for this business scenario.

-Recommendation Given the business scenario where the bank wants to prevent customer churn:

Recall should be a priority metric to ensure most potential churners are identified. F1 Score is also very useful as it balances the need to identify churners (recall) and the need to avoid false positives (precision).

Let's create a function for plotting the confusion matrix

In [ ]:
def make_confusion_matrix(actual_targets, predicted_targets):
    """
    To plot the confusion_matrix with percentages

    actual_targets: actual target (dependent) variable values
    predicted_targets: predicted target (dependent) variable values
    """
    cm = confusion_matrix(actual_targets, predicted_targets)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(cm.shape[0], cm.shape[1])

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")

Let's create two blank dataframes that will store the recall values for all the models we build.

In [ ]:
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])

Neural Network with SGD Optimizer¶

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
#Initializing the neural network
model_0 = Sequential()
# Adding the input layer with 64 neurons and relu as activation function
model_0.add(Dense(64, activation='relu', input_dim = X_train.shape[1]))
# Complete the code to add a hidden layer (specify the # of neurons and the activation function)
model_0.add(Dense(32, activation='relu'))
# Complete the code to add the output layer with the number of neurons required.
model_0.add(Dense(1, activation='sigmoid'))
In [ ]:
#Complete the code to use SGD as the optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
## Complete the code to compile the model with binary cross entropy as loss function and recall as the metric.
model_0.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
model_0.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                768       
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 2881 (11.25 KB)
Trainable params: 2881 (11.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
# Fitting the ANN

history_0 = model_0.fit(
    X_train, y_train,
    batch_size=32,    ## Complete the code to specify the batch size to use
    validation_data=(X_val,y_val),
    epochs=50,    ## Complete the code to specify the number of epochs
    verbose=1
)
Epoch 1/50
200/200 [==============================] - 1s 4ms/step - loss: 0.6131 - recall: 0.0729 - val_loss: 0.5794 - val_recall: 0.0000e+00
Epoch 2/50
200/200 [==============================] - 1s 7ms/step - loss: 0.5594 - recall: 0.0023 - val_loss: 0.5425 - val_recall: 0.0000e+00
Epoch 3/50
200/200 [==============================] - 3s 13ms/step - loss: 0.5310 - recall: 0.0000e+00 - val_loss: 0.5224 - val_recall: 0.0000e+00
Epoch 4/50
200/200 [==============================] - 3s 13ms/step - loss: 0.5150 - recall: 0.0000e+00 - val_loss: 0.5106 - val_recall: 0.0000e+00
Epoch 5/50
200/200 [==============================] - 1s 6ms/step - loss: 0.5052 - recall: 0.0000e+00 - val_loss: 0.5029 - val_recall: 0.0000e+00
Epoch 6/50
200/200 [==============================] - 2s 8ms/step - loss: 0.4984 - recall: 0.0000e+00 - val_loss: 0.4974 - val_recall: 0.0000e+00
Epoch 7/50
200/200 [==============================] - 1s 6ms/step - loss: 0.4933 - recall: 0.0000e+00 - val_loss: 0.4931 - val_recall: 0.0000e+00
Epoch 8/50
200/200 [==============================] - 1s 6ms/step - loss: 0.4891 - recall: 0.0000e+00 - val_loss: 0.4894 - val_recall: 0.0000e+00
Epoch 9/50
200/200 [==============================] - 1s 7ms/step - loss: 0.4855 - recall: 0.0000e+00 - val_loss: 0.4861 - val_recall: 0.0000e+00
Epoch 10/50
200/200 [==============================] - 1s 7ms/step - loss: 0.4822 - recall: 0.0000e+00 - val_loss: 0.4831 - val_recall: 0.0000e+00
Epoch 11/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4791 - recall: 0.0000e+00 - val_loss: 0.4804 - val_recall: 0.0000e+00
Epoch 12/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4763 - recall: 0.0000e+00 - val_loss: 0.4778 - val_recall: 0.0000e+00
Epoch 13/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4736 - recall: 0.0000e+00 - val_loss: 0.4754 - val_recall: 0.0000e+00
Epoch 14/50
200/200 [==============================] - 1s 5ms/step - loss: 0.4711 - recall: 7.6687e-04 - val_loss: 0.4731 - val_recall: 0.0000e+00
Epoch 15/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4687 - recall: 7.6687e-04 - val_loss: 0.4709 - val_recall: 0.0000e+00
Epoch 16/50
200/200 [==============================] - 1s 5ms/step - loss: 0.4664 - recall: 7.6687e-04 - val_loss: 0.4689 - val_recall: 0.0000e+00
Epoch 17/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4643 - recall: 0.0015 - val_loss: 0.4670 - val_recall: 0.0000e+00
Epoch 18/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4622 - recall: 0.0023 - val_loss: 0.4652 - val_recall: 0.0031
Epoch 19/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4603 - recall: 0.0023 - val_loss: 0.4635 - val_recall: 0.0031
Epoch 20/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4585 - recall: 0.0046 - val_loss: 0.4618 - val_recall: 0.0031
Epoch 21/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4567 - recall: 0.0054 - val_loss: 0.4603 - val_recall: 0.0031
Epoch 22/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4550 - recall: 0.0061 - val_loss: 0.4588 - val_recall: 0.0061
Epoch 23/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4534 - recall: 0.0069 - val_loss: 0.4574 - val_recall: 0.0061
Epoch 24/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4519 - recall: 0.0100 - val_loss: 0.4561 - val_recall: 0.0092
Epoch 25/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4504 - recall: 0.0138 - val_loss: 0.4548 - val_recall: 0.0092
Epoch 26/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4490 - recall: 0.0176 - val_loss: 0.4536 - val_recall: 0.0123
Epoch 27/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4477 - recall: 0.0215 - val_loss: 0.4525 - val_recall: 0.0184
Epoch 28/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4464 - recall: 0.0268 - val_loss: 0.4514 - val_recall: 0.0245
Epoch 29/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4452 - recall: 0.0299 - val_loss: 0.4504 - val_recall: 0.0307
Epoch 30/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4440 - recall: 0.0353 - val_loss: 0.4494 - val_recall: 0.0368
Epoch 31/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4429 - recall: 0.0383 - val_loss: 0.4485 - val_recall: 0.0399
Epoch 32/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4418 - recall: 0.0429 - val_loss: 0.4476 - val_recall: 0.0429
Epoch 33/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4408 - recall: 0.0452 - val_loss: 0.4468 - val_recall: 0.0429
Epoch 34/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4398 - recall: 0.0491 - val_loss: 0.4460 - val_recall: 0.0460
Epoch 35/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4389 - recall: 0.0537 - val_loss: 0.4452 - val_recall: 0.0460
Epoch 36/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4380 - recall: 0.0583 - val_loss: 0.4445 - val_recall: 0.0521
Epoch 37/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4372 - recall: 0.0667 - val_loss: 0.4438 - val_recall: 0.0521
Epoch 38/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4363 - recall: 0.0713 - val_loss: 0.4432 - val_recall: 0.0613
Epoch 39/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4355 - recall: 0.0759 - val_loss: 0.4426 - val_recall: 0.0613
Epoch 40/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4348 - recall: 0.0874 - val_loss: 0.4420 - val_recall: 0.0613
Epoch 41/50
200/200 [==============================] - 1s 4ms/step - loss: 0.4341 - recall: 0.0890 - val_loss: 0.4415 - val_recall: 0.0644
Epoch 42/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4334 - recall: 0.0928 - val_loss: 0.4410 - val_recall: 0.0675
Epoch 43/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4327 - recall: 0.0982 - val_loss: 0.4405 - val_recall: 0.0675
Epoch 44/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4321 - recall: 0.1035 - val_loss: 0.4401 - val_recall: 0.0736
Epoch 45/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4315 - recall: 0.1058 - val_loss: 0.4396 - val_recall: 0.0736
Epoch 46/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4309 - recall: 0.1089 - val_loss: 0.4392 - val_recall: 0.0767
Epoch 47/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4303 - recall: 0.1158 - val_loss: 0.4388 - val_recall: 0.0859
Epoch 48/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4298 - recall: 0.1196 - val_loss: 0.4384 - val_recall: 0.0859
Epoch 49/50
200/200 [==============================] - 1s 3ms/step - loss: 0.4292 - recall: 0.1212 - val_loss: 0.4381 - val_recall: 0.0890
Epoch 50/50
200/200 [==============================] - 0s 2ms/step - loss: 0.4287 - recall: 0.1273 - val_loss: 0.4377 - val_recall: 0.0920

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_0.history['loss'])
plt.plot(history_0.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

Recall

In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_0.history['recall'])
plt.plot(history_0.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
#Predicting the results using best as a threshold
y_train_pred = model_0.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])
In [ ]:
#Predicting the results using best as a threshold
y_val_pred = model_0.predict(X_val)    ## Complete the code to make prediction on the validation set
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])
In [ ]:
model_name = "NN with SGD"

train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification report

In [ ]:
#lassification report
cr = classification_report(y_train, y_train_pred)
print("Classification Report for NN with SGD as optimizer on training set")
print(cr)
Classification Report for NN with SGD as optimizer
              precision    recall  f1-score   support

           0       0.82      0.98      0.89      5096
           1       0.65      0.13      0.21      1304

    accuracy                           0.81      6400
   macro avg       0.73      0.56      0.55      6400
weighted avg       0.78      0.81      0.75      6400

Classification Report for NN with SGD as optimizer precision recall f1-score support

       0       0.82      0.98      0.89      5096
       1       0.65      0.13      0.21      1304

accuracy                           0.81      6400

macro avg 0.73 0.56 0.55 6400 weighted avg 0.78 0.81 0.75 6400

In [ ]:
#classification report
cr=classification_report(y_val, y_val_pred)    ## Complete the code to check the model's performance on the validation set
print("Classification Report for NN with SGD as optimizer on validation set")
print(cr)
Classification Report for NN with SGD as optimizer on validation set
              precision    recall  f1-score   support

           0       0.81      0.98      0.89      1274
           1       0.59      0.09      0.16       326

    accuracy                           0.80      1600
   macro avg       0.70      0.54      0.52      1600
weighted avg       0.76      0.80      0.74      1600

Classification Report for NN with SGD as optimizer on validation set precision recall f1-score support

       0       0.81      0.98      0.89      1274
       1       0.59      0.09      0.16       326

accuracy                           0.80      1600

macro avg 0.70 0.54 0.52 1600 weighted avg 0.76 0.80 0.74 1600

Confusion matrix

In [ ]:
make_confusion_matrix(y_train, y_train_pred)
In [ ]:
make_confusion_matrix(y_val, y_val_pred)    ## Complete the code to check the model's performance on the validation set

Observations for Neural Network with SGD as Optimizer Training and Validation Loss:

The training and validation loss both decrease steadily over the 50 epochs, indicating that the model is learning and improving its predictions. The training loss is slightly lower than the validation loss, but the gap is not significant, suggesting that the model is not overfitting. Training and Validation Recall:

The recall starts very low for both training and validation sets but gradually improves. By the end of the training, the recall on the validation set is around 0.092, indicating that the model is not very effective at identifying the positive class (churners). Confusion Matrix:

For the training set, the model has a high true negative rate (correctly identifying non-churners) but a low true positive rate (correctly identifying churners). Similarly, for the validation set, the true negative rate is high, but the true positive rate remains low. Classification Report:

The precision for the positive class (churners) is relatively low, especially for the validation set, which means there are many false positives. The recall for the positive class is very low, indicating that the model misses many actual churners. The overall accuracy is good (around 80% for both training and validation), but this is primarily due to the high accuracy in predicting the non-churners. Summary of Performance The model with SGD optimizer struggles to identify churners accurately, as evidenced by the low recall for the positive class. While the overall accuracy is high, this is misleading due to the class imbalance and the model's inability to correctly identify a significant portion of churners. The performance indicates a need for further tuning or changing the model to improve recall for the positive class, which is critical for churn prediction tasks.

Model Performance Improvement¶

Neural Network with Adam Optimizer¶

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
#Initializing the neural network
model_1 = Sequential()
#Complete the code to add a input layer (specify the # of neurons and activation function)
model_1.add(Dense(64,activation='relu',input_dim = X_train.shape[1]))
#Complete the code to add a hidden layer (specify the # of neurons and activation function)
model_1.add(Dense(32,activation='relu'))
#Complete the code to add a output layer with the required number of neurons and relu as activation function
model_1.add(Dense(1, activation = 'sigmoid'))
In [ ]:
#Complete the code to use Adam as the optimizer.
optimizer = Adam(learning_rate=0.001)

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
model_1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                768       
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 2881 (11.25 KB)
Trainable params: 2881 (11.25 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
#Fitting the ANN
history_1 = model_1.fit(
    X_train,y_train,
    batch_size= 32, ## Complete the code to specify the batch size to use
    validation_data=(X_val,y_val),
    epochs=50, ## Complete the code to specify the number of epochs
    verbose=1
)
Epoch 1/50
200/200 [==============================] - 4s 10ms/step - loss: 0.4512 - recall_1: 0.1020 - val_loss: 0.4334 - val_recall_1: 0.1442
Epoch 2/50
200/200 [==============================] - 2s 10ms/step - loss: 0.4101 - recall_1: 0.2684 - val_loss: 0.4177 - val_recall_1: 0.2669
Epoch 3/50
200/200 [==============================] - 2s 8ms/step - loss: 0.3970 - recall_1: 0.3313 - val_loss: 0.4077 - val_recall_1: 0.3374
Epoch 4/50
200/200 [==============================] - 2s 9ms/step - loss: 0.3865 - recall_1: 0.3528 - val_loss: 0.4027 - val_recall_1: 0.4202
Epoch 5/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3752 - recall_1: 0.3873 - val_loss: 0.3967 - val_recall_1: 0.2822
Epoch 6/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3660 - recall_1: 0.4011 - val_loss: 0.3878 - val_recall_1: 0.4172
Epoch 7/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3592 - recall_1: 0.4340 - val_loss: 0.3799 - val_recall_1: 0.3681
Epoch 8/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3522 - recall_1: 0.4210 - val_loss: 0.3750 - val_recall_1: 0.4264
Epoch 9/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3462 - recall_1: 0.4509 - val_loss: 0.3731 - val_recall_1: 0.3436
Epoch 10/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3403 - recall_1: 0.4563 - val_loss: 0.3632 - val_recall_1: 0.4601
Epoch 11/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3349 - recall_1: 0.4732 - val_loss: 0.3714 - val_recall_1: 0.5429
Epoch 12/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3319 - recall_1: 0.4724 - val_loss: 0.3633 - val_recall_1: 0.4693
Epoch 13/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3282 - recall_1: 0.4946 - val_loss: 0.3606 - val_recall_1: 0.4141
Epoch 14/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3249 - recall_1: 0.4900 - val_loss: 0.3603 - val_recall_1: 0.3558
Epoch 15/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3223 - recall_1: 0.4908 - val_loss: 0.3608 - val_recall_1: 0.3834
Epoch 16/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3208 - recall_1: 0.5000 - val_loss: 0.3572 - val_recall_1: 0.4233
Epoch 17/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3165 - recall_1: 0.5084 - val_loss: 0.3543 - val_recall_1: 0.4448
Epoch 18/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3138 - recall_1: 0.5146 - val_loss: 0.3597 - val_recall_1: 0.3896
Epoch 19/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3128 - recall_1: 0.5184 - val_loss: 0.3707 - val_recall_1: 0.3742
Epoch 20/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3108 - recall_1: 0.5192 - val_loss: 0.3559 - val_recall_1: 0.4540
Epoch 21/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3095 - recall_1: 0.5176 - val_loss: 0.3557 - val_recall_1: 0.4816
Epoch 22/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3093 - recall_1: 0.5261 - val_loss: 0.3561 - val_recall_1: 0.4294
Epoch 23/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3065 - recall_1: 0.5123 - val_loss: 0.3581 - val_recall_1: 0.4571
Epoch 24/50
200/200 [==============================] - 1s 5ms/step - loss: 0.3049 - recall_1: 0.5291 - val_loss: 0.3574 - val_recall_1: 0.4509
Epoch 25/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3038 - recall_1: 0.5376 - val_loss: 0.3671 - val_recall_1: 0.4110
Epoch 26/50
200/200 [==============================] - 1s 4ms/step - loss: 0.3028 - recall_1: 0.5299 - val_loss: 0.3598 - val_recall_1: 0.4509
Epoch 27/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3015 - recall_1: 0.5291 - val_loss: 0.3626 - val_recall_1: 0.4325
Epoch 28/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3009 - recall_1: 0.5437 - val_loss: 0.3624 - val_recall_1: 0.4417
Epoch 29/50
200/200 [==============================] - 1s 3ms/step - loss: 0.3010 - recall_1: 0.5238 - val_loss: 0.3643 - val_recall_1: 0.5245
Epoch 30/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2964 - recall_1: 0.5406 - val_loss: 0.3604 - val_recall_1: 0.5215
Epoch 31/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2967 - recall_1: 0.5483 - val_loss: 0.3649 - val_recall_1: 0.4080
Epoch 32/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2958 - recall_1: 0.5468 - val_loss: 0.3638 - val_recall_1: 0.4509
Epoch 33/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2945 - recall_1: 0.5422 - val_loss: 0.3630 - val_recall_1: 0.4601
Epoch 34/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2950 - recall_1: 0.5422 - val_loss: 0.3635 - val_recall_1: 0.4816
Epoch 35/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2918 - recall_1: 0.5621 - val_loss: 0.3613 - val_recall_1: 0.4632
Epoch 36/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2913 - recall_1: 0.5460 - val_loss: 0.3661 - val_recall_1: 0.4724
Epoch 37/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2901 - recall_1: 0.5575 - val_loss: 0.3629 - val_recall_1: 0.4663
Epoch 38/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2880 - recall_1: 0.5475 - val_loss: 0.3660 - val_recall_1: 0.5583
Epoch 39/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2882 - recall_1: 0.5606 - val_loss: 0.3694 - val_recall_1: 0.4571
Epoch 40/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2877 - recall_1: 0.5575 - val_loss: 0.3750 - val_recall_1: 0.5245
Epoch 41/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2870 - recall_1: 0.5529 - val_loss: 0.3724 - val_recall_1: 0.3957
Epoch 42/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2846 - recall_1: 0.5613 - val_loss: 0.3651 - val_recall_1: 0.4693
Epoch 43/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2846 - recall_1: 0.5652 - val_loss: 0.3711 - val_recall_1: 0.4417
Epoch 44/50
200/200 [==============================] - 1s 4ms/step - loss: 0.2834 - recall_1: 0.5529 - val_loss: 0.3743 - val_recall_1: 0.4755
Epoch 45/50
200/200 [==============================] - 1s 4ms/step - loss: 0.2828 - recall_1: 0.5759 - val_loss: 0.3716 - val_recall_1: 0.5092
Epoch 46/50
200/200 [==============================] - 1s 4ms/step - loss: 0.2800 - recall_1: 0.5629 - val_loss: 0.3701 - val_recall_1: 0.4417
Epoch 47/50
200/200 [==============================] - 1s 5ms/step - loss: 0.2788 - recall_1: 0.5729 - val_loss: 0.3807 - val_recall_1: 0.4264
Epoch 48/50
200/200 [==============================] - 1s 4ms/step - loss: 0.2807 - recall_1: 0.5736 - val_loss: 0.3710 - val_recall_1: 0.4755
Epoch 49/50
200/200 [==============================] - 1s 5ms/step - loss: 0.2774 - recall_1: 0.5805 - val_loss: 0.3760 - val_recall_1: 0.4202
Epoch 50/50
200/200 [==============================] - 1s 3ms/step - loss: 0.2780 - recall_1: 0.5790 - val_loss: 0.3700 - val_recall_1: 0.4571

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

Recall

In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_1.history['recall'])
plt.plot(history_1.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-66-2543e1ea2246> in <cell line: 2>()
      1 #Plotting Train recall vs Validation recall
----> 2 plt.plot(history_1.history['recall'])
      3 plt.plot(history_1.history['val_recall'])
      4 plt.title('model recall')
      5 plt.ylabel('Recall')

KeyError: 'recall'
In [ ]:
#Predicting the results using 0.5 as the threshold
y_train_pred = model_1.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
Out[ ]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [ ]:
#Predicting the results using 0.5 as the threshold
y_val_pred = model_1.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 3ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [ True]])
In [ ]:
model_name = "NN with Adam"

train_metric_df.loc[model_name] = recall_score(y_train,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)

Classification report

In [ ]:
#lassification report

cr=classification_report(y_train,y_train_pred)
print("Classification Report for NN with Adam as optimizer on training set")
print(cr)
Classification Report for NN with SGD as optimizer on training set
              precision    recall  f1-score   support

           0       0.90      0.97      0.93      5096
           1       0.82      0.60      0.69      1304

    accuracy                           0.89      6400
   macro avg       0.86      0.78      0.81      6400
weighted avg       0.89      0.89      0.88      6400

In [ ]:
#classification report
cr=classification_report(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set
print("Classification Report for NN with Adam as optimizer on validation set")
print(cr)
              precision    recall  f1-score   support

           0       0.87      0.95      0.91      1274
           1       0.70      0.46      0.55       326

    accuracy                           0.85      1600
   macro avg       0.78      0.70      0.73      1600
weighted avg       0.84      0.85      0.84      1600

Confusion matrix

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_train, y_train_pred)
In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set

Observations and Key Takeaways for NN with Adam Optimizer Learning Curves Model Loss:

The training loss consistently decreases over the epochs, indicating that the model is learning. The validation loss shows a decreasing trend initially but starts to fluctuate after around 20 epochs, suggesting potential overfitting or the need for further tuning. Model Recall:

The recall for both training and validation sets shows improvement over the epochs. The validation recall is more volatile compared to the training recall, indicating variability in how well the model identifies positive cases (churners) on the validation set. Confusion Matrices Training Set:

True Negatives (TN): 4921 False Positives (FP): 175 False Negatives (FN): 525 True Positives (TP): 779 Validation Set:

True Negatives (TN): 1209 False Positives (FP): 65 False Negatives (FN): 177 True Positives (TP): 149 Classification Reports Training Set:

Precision: Class 0: 0.90 Class 1: 0.82 Recall: Class 0: 0.97 Class 1: 0.60 F1-Score: Class 0: 0.93 Class 1: 0.69 Accuracy: 0.89 Validation Set:

Precision: Class 0: 0.87 Class 1: 0.70 Recall: Class 0: 0.95 Class 1: 0.46 F1-Score: Class 0: 0.91 Class 1: 0.55 Accuracy: 0.85 Key Takeaways Improved Performance:

The model with the Adam optimizer shows improved recall for the positive class (churners) compared to the model with the SGD optimizer. Both precision and recall for the positive class are better with Adam, leading to higher F1-scores. Class Imbalance Impact:

The recall for the positive class (churners) is still lower than desired, indicating the persistent challenge of class imbalance. Precision for the positive class is relatively high, suggesting that when the model predicts churn, it is often correct. Generalization:

The model generalizes better than the previous model with SGD, as indicated by the more consistent performance metrics across the training and validation sets. However, the validation recall's volatility indicates that the model might benefit from further regularization or different dropout rates. Potential Overfitting:

The fluctuations in validation loss and recall suggest potential overfitting after a certain number of epochs. Implementing early stopping based on validation performance could help prevent overfitting.

Neural Network with Adam Optimizer and Dropout¶

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
# Initializing the neural network
model_2 = Sequential()

# Adding the input layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu', input_dim=X_train.shape[1]))

# Adding dropout with ratio of 0.2
model_2.add(Dropout(0.2))

# Adding a hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))

# Adding another hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))

# Adding dropout with ratio of 0.1
model_2.add(Dropout(0.1))

# Adding another hidden layer with 32 neurons and relu as activation function
model_2.add(Dense(32, activation='relu'))

# Adding the output layer with 1 neuron and sigmoid as activation function
model_2.add(Dense(1, activation='sigmoid'))
In [ ]:
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
## Complete the code to compile the model with binary cross entropy as loss function and recall as the metric.
model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
# Summary of the model
model_2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dropout (Dropout)           (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 32)                1056      
                                                                 
 dense_2 (Dense)             (None, 32)                1056      
                                                                 
 dropout_1 (Dropout)         (None, 32)                0         
                                                                 
 dense_3 (Dense)             (None, 32)                1056      
                                                                 
 dense_4 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 3585 (14.00 KB)
Trainable params: 3585 (14.00 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
# Fitting the ANN with batch_size = 32 and 100 epochs
history_2 = model_2.fit(
    X_train, y_train,
    batch_size=32,  ## Complete the code to specify the batch size.
    epochs=100, ## Complete the code to specify the # of epochs.
    verbose=1,
    validation_data=(X_val, y_val)
)
Epoch 1/100
200/200 [==============================] - 2s 5ms/step - loss: 0.4842 - recall: 0.0284 - val_loss: 0.4426 - val_recall: 0.0000e+00
Epoch 2/100
200/200 [==============================] - 1s 4ms/step - loss: 0.4413 - recall: 0.0544 - val_loss: 0.4300 - val_recall: 0.0399
Epoch 3/100
200/200 [==============================] - 1s 4ms/step - loss: 0.4323 - recall: 0.1342 - val_loss: 0.4226 - val_recall: 0.2055
Epoch 4/100
200/200 [==============================] - 1s 4ms/step - loss: 0.4236 - recall: 0.2109 - val_loss: 0.4202 - val_recall: 0.3712
Epoch 5/100
200/200 [==============================] - 1s 4ms/step - loss: 0.4195 - recall: 0.2791 - val_loss: 0.4132 - val_recall: 0.2086
Epoch 6/100
200/200 [==============================] - 1s 5ms/step - loss: 0.4129 - recall: 0.2791 - val_loss: 0.4052 - val_recall: 0.3129
Epoch 7/100
200/200 [==============================] - 1s 6ms/step - loss: 0.4065 - recall: 0.3044 - val_loss: 0.3999 - val_recall: 0.3006
Epoch 8/100
200/200 [==============================] - 1s 6ms/step - loss: 0.4005 - recall: 0.3052 - val_loss: 0.4024 - val_recall: 0.3190
Epoch 9/100
200/200 [==============================] - 1s 6ms/step - loss: 0.3990 - recall: 0.2968 - val_loss: 0.4014 - val_recall: 0.2301
Epoch 10/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3971 - recall: 0.2983 - val_loss: 0.3965 - val_recall: 0.3067
Epoch 11/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3948 - recall: 0.3213 - val_loss: 0.3971 - val_recall: 0.4018
Epoch 12/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3911 - recall: 0.3566 - val_loss: 0.3940 - val_recall: 0.3006
Epoch 13/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3896 - recall: 0.3512 - val_loss: 0.3895 - val_recall: 0.3344
Epoch 14/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3895 - recall: 0.3413 - val_loss: 0.3886 - val_recall: 0.2883
Epoch 15/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3802 - recall: 0.3666 - val_loss: 0.3826 - val_recall: 0.3834
Epoch 16/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3787 - recall: 0.3781 - val_loss: 0.3836 - val_recall: 0.3497
Epoch 17/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3773 - recall: 0.3850 - val_loss: 0.3773 - val_recall: 0.3589
Epoch 18/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3750 - recall: 0.3919 - val_loss: 0.3752 - val_recall: 0.3282
Epoch 19/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3653 - recall: 0.4072 - val_loss: 0.3709 - val_recall: 0.3221
Epoch 20/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3667 - recall: 0.4018 - val_loss: 0.3656 - val_recall: 0.4080
Epoch 21/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3603 - recall: 0.4195 - val_loss: 0.3646 - val_recall: 0.3834
Epoch 22/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3593 - recall: 0.4187 - val_loss: 0.3579 - val_recall: 0.4172
Epoch 23/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3586 - recall: 0.4164 - val_loss: 0.3592 - val_recall: 0.4049
Epoch 24/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3550 - recall: 0.4310 - val_loss: 0.3558 - val_recall: 0.4325
Epoch 25/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3474 - recall: 0.4479 - val_loss: 0.3569 - val_recall: 0.4018
Epoch 26/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3496 - recall: 0.4317 - val_loss: 0.3529 - val_recall: 0.4387
Epoch 27/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3496 - recall: 0.4325 - val_loss: 0.3524 - val_recall: 0.4080
Epoch 28/100
200/200 [==============================] - 1s 6ms/step - loss: 0.3392 - recall: 0.4678 - val_loss: 0.3500 - val_recall: 0.4356
Epoch 29/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3459 - recall: 0.4310 - val_loss: 0.3559 - val_recall: 0.5092
Epoch 30/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3388 - recall: 0.4647 - val_loss: 0.3507 - val_recall: 0.4816
Epoch 31/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3437 - recall: 0.4502 - val_loss: 0.3496 - val_recall: 0.3926
Epoch 32/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3437 - recall: 0.4371 - val_loss: 0.3497 - val_recall: 0.4294
Epoch 33/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3375 - recall: 0.4555 - val_loss: 0.3498 - val_recall: 0.4816
Epoch 34/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3339 - recall: 0.4532 - val_loss: 0.3504 - val_recall: 0.4785
Epoch 35/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3339 - recall: 0.4632 - val_loss: 0.3565 - val_recall: 0.5000
Epoch 36/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3371 - recall: 0.4586 - val_loss: 0.3498 - val_recall: 0.4479
Epoch 37/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3324 - recall: 0.4839 - val_loss: 0.3472 - val_recall: 0.4479
Epoch 38/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3329 - recall: 0.4571 - val_loss: 0.3520 - val_recall: 0.4908
Epoch 39/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3326 - recall: 0.4693 - val_loss: 0.3535 - val_recall: 0.4908
Epoch 40/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3344 - recall: 0.4647 - val_loss: 0.3545 - val_recall: 0.4693
Epoch 41/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3344 - recall: 0.4509 - val_loss: 0.3488 - val_recall: 0.4663
Epoch 42/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3328 - recall: 0.4563 - val_loss: 0.3494 - val_recall: 0.4509
Epoch 43/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3312 - recall: 0.4548 - val_loss: 0.3490 - val_recall: 0.4387
Epoch 44/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3260 - recall: 0.4716 - val_loss: 0.3534 - val_recall: 0.4908
Epoch 45/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3262 - recall: 0.4663 - val_loss: 0.3518 - val_recall: 0.4601
Epoch 46/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3272 - recall: 0.4701 - val_loss: 0.3543 - val_recall: 0.4571
Epoch 47/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3245 - recall: 0.4770 - val_loss: 0.3588 - val_recall: 0.4172
Epoch 48/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3312 - recall: 0.4540 - val_loss: 0.3545 - val_recall: 0.4387
Epoch 49/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3264 - recall: 0.4686 - val_loss: 0.3602 - val_recall: 0.4724
Epoch 50/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3253 - recall: 0.4770 - val_loss: 0.3532 - val_recall: 0.4693
Epoch 51/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3283 - recall: 0.4663 - val_loss: 0.3545 - val_recall: 0.4724
Epoch 52/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3213 - recall: 0.4709 - val_loss: 0.3522 - val_recall: 0.4632
Epoch 53/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3222 - recall: 0.4847 - val_loss: 0.3556 - val_recall: 0.4785
Epoch 54/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3264 - recall: 0.4801 - val_loss: 0.3534 - val_recall: 0.4540
Epoch 55/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3251 - recall: 0.4877 - val_loss: 0.3539 - val_recall: 0.4049
Epoch 56/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3205 - recall: 0.4824 - val_loss: 0.3549 - val_recall: 0.4233
Epoch 57/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3231 - recall: 0.4655 - val_loss: 0.3522 - val_recall: 0.4356
Epoch 58/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3215 - recall: 0.4793 - val_loss: 0.3532 - val_recall: 0.4448
Epoch 59/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3209 - recall: 0.4747 - val_loss: 0.3580 - val_recall: 0.5215
Epoch 60/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3198 - recall: 0.4946 - val_loss: 0.3538 - val_recall: 0.4110
Epoch 61/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3228 - recall: 0.4762 - val_loss: 0.3573 - val_recall: 0.4755
Epoch 62/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3194 - recall: 0.4632 - val_loss: 0.3573 - val_recall: 0.4632
Epoch 63/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3225 - recall: 0.4778 - val_loss: 0.3570 - val_recall: 0.4509
Epoch 64/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3233 - recall: 0.4770 - val_loss: 0.3550 - val_recall: 0.4479
Epoch 65/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3198 - recall: 0.4778 - val_loss: 0.3559 - val_recall: 0.4509
Epoch 66/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3126 - recall: 0.4900 - val_loss: 0.3560 - val_recall: 0.4540
Epoch 67/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3211 - recall: 0.4678 - val_loss: 0.3595 - val_recall: 0.5123
Epoch 68/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3226 - recall: 0.4770 - val_loss: 0.3599 - val_recall: 0.5031
Epoch 69/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3186 - recall: 0.4839 - val_loss: 0.3572 - val_recall: 0.4847
Epoch 70/100
200/200 [==============================] - 1s 4ms/step - loss: 0.3196 - recall: 0.4854 - val_loss: 0.3576 - val_recall: 0.4785
Epoch 71/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3167 - recall: 0.4686 - val_loss: 0.3553 - val_recall: 0.5092
Epoch 72/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3132 - recall: 0.5000 - val_loss: 0.3560 - val_recall: 0.4816
Epoch 73/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4931 - val_loss: 0.3538 - val_recall: 0.4847
Epoch 74/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3161 - recall: 0.4808 - val_loss: 0.3585 - val_recall: 0.4724
Epoch 75/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3173 - recall: 0.4831 - val_loss: 0.3547 - val_recall: 0.3988
Epoch 76/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4747 - val_loss: 0.3507 - val_recall: 0.4663
Epoch 77/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3173 - recall: 0.4870 - val_loss: 0.3550 - val_recall: 0.4693
Epoch 78/100
200/200 [==============================] - 0s 2ms/step - loss: 0.3141 - recall: 0.4923 - val_loss: 0.3541 - val_recall: 0.4479
Epoch 79/100
200/200 [==============================] - 0s 2ms/step - loss: 0.3121 - recall: 0.4969 - val_loss: 0.3579 - val_recall: 0.4755
Epoch 80/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3166 - recall: 0.4839 - val_loss: 0.3593 - val_recall: 0.4877
Epoch 81/100
200/200 [==============================] - 1s 2ms/step - loss: 0.3148 - recall: 0.4954 - val_loss: 0.3532 - val_recall: 0.4663
Epoch 82/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3116 - recall: 0.4916 - val_loss: 0.3656 - val_recall: 0.5215
Epoch 83/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3151 - recall: 0.4923 - val_loss: 0.3583 - val_recall: 0.4601
Epoch 84/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3133 - recall: 0.4801 - val_loss: 0.3674 - val_recall: 0.5092
Epoch 85/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.4931 - val_loss: 0.3559 - val_recall: 0.4785
Epoch 86/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3152 - recall: 0.4908 - val_loss: 0.3624 - val_recall: 0.5245
Epoch 87/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.4939 - val_loss: 0.3606 - val_recall: 0.4939
Epoch 88/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3152 - recall: 0.4801 - val_loss: 0.3655 - val_recall: 0.4877
Epoch 89/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3075 - recall: 0.5169 - val_loss: 0.3579 - val_recall: 0.4356
Epoch 90/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3156 - recall: 0.4847 - val_loss: 0.3564 - val_recall: 0.4847
Epoch 91/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3091 - recall: 0.5031 - val_loss: 0.3614 - val_recall: 0.4632
Epoch 92/100
200/200 [==============================] - 1s 5ms/step - loss: 0.3146 - recall: 0.4962 - val_loss: 0.3560 - val_recall: 0.4724
Epoch 93/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3117 - recall: 0.4801 - val_loss: 0.3563 - val_recall: 0.4755
Epoch 94/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3127 - recall: 0.4801 - val_loss: 0.3654 - val_recall: 0.5092
Epoch 95/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3112 - recall: 0.5000 - val_loss: 0.3621 - val_recall: 0.4663
Epoch 96/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3105 - recall: 0.5038 - val_loss: 0.3596 - val_recall: 0.4693
Epoch 97/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3122 - recall: 0.4885 - val_loss: 0.3588 - val_recall: 0.4755
Epoch 98/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3095 - recall: 0.4946 - val_loss: 0.3587 - val_recall: 0.4877
Epoch 99/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3096 - recall: 0.5008 - val_loss: 0.3599 - val_recall: 0.4693
Epoch 100/100
200/200 [==============================] - 1s 3ms/step - loss: 0.3103 - recall: 0.5061 - val_loss: 0.3596 - val_recall: 0.4816

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_2.history['loss'])
plt.plot(history_2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

From the above plot, we can observe that the train and validation curves are having smooth lines. Reducing the number of neurons and adding dropouts to the model worked, and the problem of overfitting was solved.

In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_2.history['recall'])
plt.plot(history_2.history['val_recall'])
plt.title('model recall')
plt.ylabel('recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
#Predicting the results using best as a threshold
y_train_pred = model_2.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 2ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [False]])
In [ ]:
#Predicting the results using 0.5 as the threshold.
y_val_pred = model_2.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [ ]:
model_name = "NN with Adam & Dropout"

train_metric_df.loc[model_name] = recall_score(y_train,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)

Classification report

In [ ]:
#classification report
cr=classification_report(y_train,y_train_pred)
print("Classificaton Report of NN with Adam and dropout on training set")
print(cr)
Classificaton Report of NN with Adam and dropout on training set
              precision    recall  f1-score   support

           0       0.90      0.97      0.93      5096
           1       0.84      0.56      0.67      1304

    accuracy                           0.89      6400
   macro avg       0.87      0.76      0.80      6400
weighted avg       0.88      0.89      0.88      6400

In [ ]:
#classification report
cr = classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classificaton Report of NN with Adam and dropout on Validation set")
print(cr)
Classificaton Report of NN with Adam and dropout on Validation set
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1274
           1       0.73      0.48      0.58       326

    accuracy                           0.86      1600
   macro avg       0.81      0.72      0.75      1600
weighted avg       0.85      0.86      0.85      1600

Confusion matrix

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_train, y_train_pred)
In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set

Key Observations and Takeaways for NN with Adam Optimizer and Dropout Model Loss Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss decreases initially but then fluctuates, indicating some level of overfitting. However, the fluctuations are not very large, suggesting that the dropout layers are helping to control overfitting to some extent. Model Recall Training Recall: The training recall improves steadily, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves but shows more fluctuations compared to the training recall, suggesting some variability in performance on the validation data. Confusion Matrix (Training Data) True Negatives (TN): 4959 (77.48%) False Positives (FP): 137 (2.14%) False Negatives (FN): 580 (9.06%) True Positives (TP): 724 (11.31%) Confusion Matrix (Validation Data) True Negatives (TN): 1217 (76.06%) False Positives (FP): 57 (3.56%) False Negatives (FN): 169 (10.56%) True Positives (TP): 157 (9.81%) Classification Report (Training Data) Precision: 0.90 (Class 0), 0.84 (Class 1) Recall: 0.97 (Class 0), 0.56 (Class 1) F1-Score: 0.93 (Class 0), 0.67 (Class 1) Accuracy: 0.89 Classification Report (Validation Data) Precision: 0.88 (Class 0), 0.73 (Class 1) Recall: 0.96 (Class 0), 0.48 (Class 1) F1-Score: 0.92 (Class 0), 0.58 (Class 1) Accuracy: 0.86 Key Takeaways Improvement Over Previous Model: The introduction of dropout layers has improved the model’s ability to generalize, as evidenced by the reduction in overfitting and improved validation metrics compared to the previous model without dropout. Recall: While the recall for the positive class (churn) is higher than in the previous models, it is still relatively low, indicating that the model is missing a significant number of positive cases. Precision: The precision for both classes is high, indicating that when the model predicts a class, it is likely to be correct. This is particularly important for the business context where false positives (predicting churn when the customer will not churn) could lead to unnecessary retention efforts. Overall Performance: The model shows good overall performance with an accuracy of 0.86 on the validation set. However, there is still room for improvement, particularly in increasing the recall for the positive class to ensure more churn cases are correctly identified. By incorporating dropout, the model has become more robust to overfitting and has shown a more stable performance across epochs. However, further improvements are needed to increase the recall for the positive class, which is critical in a churn prediction context.

Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer¶

Let's try to apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.

In [ ]:
sm  = SMOTE(random_state=42)
#Complete the code to fit SMOTE on the training data.
X_train_smote, y_train_smote= sm.fit_resample(X_train, y_train)
print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11)
After UpSampling, the shape of train_y: (10192,) 

After UpSampling, the shape of train_X: (10192, 11) After UpSampling, the shape of train_y: (10192,)

Let's build a model with the balanced dataset

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
# Initializing the model
model_3 = Sequential()
# Add the input layer with 32 neurons and relu activation function
model_3.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Add a hidden layer with 16 neurons and relu activation function
model_3.add(Dense(16, activation='relu'))
# Add another hidden layer with 16 neurons and relu activation function
model_3.add(Dense(16, activation='relu'))
# Add the output layer with 1 neuron and sigmoid activation function
model_3.add(Dense(1, activation='sigmoid'))
In [ ]:
#Complete the code to use SGD as the optimizer.
optimizer = tf.keras.optimizers.SGD(0.001)

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
model_3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dense_1 (Dense)             (None, 16)                528       
                                                                 
 dense_2 (Dense)             (None, 16)                272       
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
=================================================================
Total params: 1201 (4.69 KB)
Trainable params: 1201 (4.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
# Fitting the ANN
history_3 = model_3.fit(
    X_train_smote, y_train_smote,
    batch_size=32,  ## Specify the batch size to use
    epochs=50,  ## Specify the number of epochs
    verbose=1,
    validation_data=(X_val, y_val)
)
Epoch 1/50
319/319 [==============================] - 2s 3ms/step - loss: 0.6885 - recall: 0.4309 - val_loss: 0.6703 - val_recall: 0.3466
Epoch 2/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6820 - recall: 0.3819 - val_loss: 0.6568 - val_recall: 0.3190
Epoch 3/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6772 - recall: 0.3966 - val_loss: 0.6478 - val_recall: 0.3006
Epoch 4/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6732 - recall: 0.4237 - val_loss: 0.6419 - val_recall: 0.3436
Epoch 5/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6695 - recall: 0.4631 - val_loss: 0.6377 - val_recall: 0.3957
Epoch 6/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6660 - recall: 0.5124 - val_loss: 0.6344 - val_recall: 0.4294
Epoch 7/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6625 - recall: 0.5410 - val_loss: 0.6322 - val_recall: 0.4479
Epoch 8/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6589 - recall: 0.5810 - val_loss: 0.6299 - val_recall: 0.4847
Epoch 9/50
319/319 [==============================] - 1s 4ms/step - loss: 0.6553 - recall: 0.5987 - val_loss: 0.6280 - val_recall: 0.5184
Epoch 10/50
319/319 [==============================] - 2s 5ms/step - loss: 0.6517 - recall: 0.6207 - val_loss: 0.6260 - val_recall: 0.5460
Epoch 11/50
319/319 [==============================] - 1s 4ms/step - loss: 0.6480 - recall: 0.6366 - val_loss: 0.6242 - val_recall: 0.5613
Epoch 12/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6443 - recall: 0.6472 - val_loss: 0.6218 - val_recall: 0.5736
Epoch 13/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6406 - recall: 0.6515 - val_loss: 0.6209 - val_recall: 0.5828
Epoch 14/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6370 - recall: 0.6674 - val_loss: 0.6183 - val_recall: 0.5859
Epoch 15/50
319/319 [==============================] - 1s 2ms/step - loss: 0.6333 - recall: 0.6692 - val_loss: 0.6174 - val_recall: 0.5982
Epoch 16/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6296 - recall: 0.6811 - val_loss: 0.6156 - val_recall: 0.6074
Epoch 17/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6260 - recall: 0.6872 - val_loss: 0.6130 - val_recall: 0.6196
Epoch 18/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6224 - recall: 0.6913 - val_loss: 0.6103 - val_recall: 0.6166
Epoch 19/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6188 - recall: 0.6949 - val_loss: 0.6075 - val_recall: 0.6166
Epoch 20/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6154 - recall: 0.6927 - val_loss: 0.6071 - val_recall: 0.6288
Epoch 21/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6120 - recall: 0.6974 - val_loss: 0.6044 - val_recall: 0.6319
Epoch 22/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6086 - recall: 0.7041 - val_loss: 0.6001 - val_recall: 0.6258
Epoch 23/50
319/319 [==============================] - 1s 3ms/step - loss: 0.6054 - recall: 0.6972 - val_loss: 0.5994 - val_recall: 0.6288
Epoch 24/50
319/319 [==============================] - 1s 4ms/step - loss: 0.6023 - recall: 0.7057 - val_loss: 0.5972 - val_recall: 0.6288
Epoch 25/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5992 - recall: 0.7057 - val_loss: 0.5965 - val_recall: 0.6380
Epoch 26/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5963 - recall: 0.7098 - val_loss: 0.5944 - val_recall: 0.6442
Epoch 27/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5935 - recall: 0.7090 - val_loss: 0.5934 - val_recall: 0.6472
Epoch 28/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5909 - recall: 0.7135 - val_loss: 0.5904 - val_recall: 0.6472
Epoch 29/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5883 - recall: 0.7084 - val_loss: 0.5920 - val_recall: 0.6534
Epoch 30/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5860 - recall: 0.7129 - val_loss: 0.5880 - val_recall: 0.6472
Epoch 31/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5837 - recall: 0.7106 - val_loss: 0.5876 - val_recall: 0.6503
Epoch 32/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5817 - recall: 0.7143 - val_loss: 0.5846 - val_recall: 0.6472
Epoch 33/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5797 - recall: 0.7153 - val_loss: 0.5829 - val_recall: 0.6472
Epoch 34/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5779 - recall: 0.7162 - val_loss: 0.5816 - val_recall: 0.6534
Epoch 35/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5762 - recall: 0.7194 - val_loss: 0.5776 - val_recall: 0.6534
Epoch 36/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5747 - recall: 0.7162 - val_loss: 0.5788 - val_recall: 0.6656
Epoch 37/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5733 - recall: 0.7182 - val_loss: 0.5790 - val_recall: 0.6718
Epoch 38/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5719 - recall: 0.7227 - val_loss: 0.5762 - val_recall: 0.6687
Epoch 39/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5707 - recall: 0.7200 - val_loss: 0.5782 - val_recall: 0.6718
Epoch 40/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5695 - recall: 0.7243 - val_loss: 0.5765 - val_recall: 0.6748
Epoch 41/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5684 - recall: 0.7296 - val_loss: 0.5726 - val_recall: 0.6626
Epoch 42/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5673 - recall: 0.7308 - val_loss: 0.5697 - val_recall: 0.6626
Epoch 43/50
319/319 [==============================] - 2s 5ms/step - loss: 0.5663 - recall: 0.7259 - val_loss: 0.5717 - val_recall: 0.6687
Epoch 44/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5654 - recall: 0.7302 - val_loss: 0.5696 - val_recall: 0.6687
Epoch 45/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5645 - recall: 0.7316 - val_loss: 0.5687 - val_recall: 0.6718
Epoch 46/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5636 - recall: 0.7333 - val_loss: 0.5674 - val_recall: 0.6718
Epoch 47/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5627 - recall: 0.7278 - val_loss: 0.5702 - val_recall: 0.6810
Epoch 48/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5619 - recall: 0.7333 - val_loss: 0.5678 - val_recall: 0.6779
Epoch 49/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5611 - recall: 0.7323 - val_loss: 0.5683 - val_recall: 0.6779
Epoch 50/50
319/319 [==============================] - 1s 2ms/step - loss: 0.5604 - recall: 0.7329 - val_loss: 0.5677 - val_recall: 0.6779

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_3.history['recall'])
plt.plot(history_3.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
y_train_pred = model_3.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 1ms/step
Out[ ]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [ ]:
y_val_pred = model_3.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
Out[ ]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [ ]:
model_name = "NN with SMOTE & SGD"

train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)

Classification report

In [ ]:
cr=classification_report(y_train_smote,y_train_pred)
print("Classification report of NN with SMOTE & SGD on training set")
print(cr)
Classification report of NN with SMOTE & SGD on training set
              precision    recall  f1-score   support

           0       0.73      0.71      0.72      5096
           1       0.72      0.73      0.73      5096

    accuracy                           0.72     10192
   macro avg       0.72      0.72      0.72     10192
weighted avg       0.72      0.72      0.72     10192

In [ ]:
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification report of NN with SMOTE & SGD on validation set")
print(cr)
Classification report of NN with SMOTE & SGD on validation set
              precision    recall  f1-score   support

           0       0.90      0.72      0.80      1274
           1       0.38      0.68      0.49       326

    accuracy                           0.71      1600
   macro avg       0.64      0.70      0.64      1600
weighted avg       0.79      0.71      0.74      1600

Confusion matrix

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
In [ ]:
#Calculating the confusion matrix

make_confusion_matrix(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set

Key Observations and Takeaways for NN with SMOTE & SGD

Model Loss Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss closely follows the training loss, decreasing over time. This indicates that the model is generalizing well to unseen data without significant overfitting. Model Recall Training Recall: The training recall shows a significant improvement during the initial epochs and then stabilizes around 73%, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves significantly and stabilizes around 68%. The validation recall follows a similar trend to the training recall, suggesting consistent performance. Confusion Matrix Training Data:

True Negatives (TN): 3635 (35.67%) False Positives (FP): 1461 (14.33%) False Negatives (FN): 1352 (13.27%) True Positives (TP): 3744 (36.73%) Validation Data:

True Negatives (TN): 917 (57.31%) False Positives (FP): 357 (22.31%) False Negatives (FN): 105 (6.56%) True Positives (TP): 221 (13.81%) Classification Report Training Data:

Precision: 0.73 (Class 0), 0.72 (Class 1) Recall: 0.71 (Class 0), 0.73 (Class 1) F1-Score: 0.72 (Class 0), 0.73 (Class 1) Accuracy: 0.72 Validation Data:

Precision: 0.90 (Class 0), 0.38 (Class 1) Recall: 0.72 (Class 0), 0.68 (Class 1) F1-Score: 0.80 (Class 0), 0.49 (Class 1) Accuracy: 0.71 Key Takeaways Balanced Performance on Training Data:

The application of SMOTE has balanced the class distribution, leading to a more balanced performance on the training set. Both precision and recall are approximately 72-73% for both classes. Improvement in Recall for Positive Class:

The recall for the positive class (churn) has significantly improved compared to previous models without SMOTE, indicating that the model is now better at identifying churn cases in the training data. Validation Performance:

The validation performance shows a significant drop in precision for the positive class (churn) but maintains a reasonably high recall. This suggests that while the model is good at identifying churn cases, it also has a higher rate of false positives in the validation data. Generalization:

The model generalizes well to unseen data, as indicated by the close alignment of training and validation loss curves. However, the precision for the churn class on the validation set indicates room for improvement in reducing false positives. Overall Performance:

The overall accuracy of 71% on the validation set is decent, with a balanced recall. The model has improved recall for the positive class, which is critical in a churn prediction context to ensure more churn cases are correctly identified. By applying SMOTE and using the SGD optimizer, the model has achieved a more balanced recall for both classes and demonstrates good generalization capability. However, there is still room for improvement in precision for the churn class to reduce false positives and improve the model's overall reliability

Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer¶

Let's build a model with the balanced dataset

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
# Initializing the model
model_4 = Sequential()
# Complete the code to add an input layer (specify the # of neurons and activation function)
model_4.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Complete the code to add a hidden layer (specify the # of neurons and the activation function)
model_4.add(Dense(16, activation='relu'))
# Complete the code to add another hidden layer (specify the # of neurons and the activation function)
model_4.add(Dense(16, activation='relu'))
# Complete the code to add the required number of neurons in the output layer and a suitable activation function
model_4.add(Dense(1, activation='sigmoid'))
In [ ]:
model_4.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dense_1 (Dense)             (None, 16)                528       
                                                                 
 dense_2 (Dense)             (None, 16)                272       
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
=================================================================
Total params: 1201 (4.69 KB)
Trainable params: 1201 (4.69 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam()

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
model_4.summary()
In [ ]:
# Fitting the ANN

history_4 = model_4.fit(
    X_train_smote, y_train_smote,
    batch_size=32,  ## Batch size to use
    epochs=50,  ## Number of epochs
    verbose=1,
    validation_data=(X_val, y_val)
)
Epoch 1/50
319/319 [==============================] - 3s 5ms/step - loss: 0.5890 - recall: 0.6786 - val_loss: 0.5668 - val_recall: 0.6994
Epoch 2/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5355 - recall: 0.7433 - val_loss: 0.5250 - val_recall: 0.6840
Epoch 3/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4993 - recall: 0.7584 - val_loss: 0.5235 - val_recall: 0.7178
Epoch 4/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4711 - recall: 0.7734 - val_loss: 0.4718 - val_recall: 0.6718
Epoch 5/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4511 - recall: 0.7804 - val_loss: 0.5116 - val_recall: 0.7485
Epoch 6/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4378 - recall: 0.7891 - val_loss: 0.4825 - val_recall: 0.7393
Epoch 7/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4268 - recall: 0.7957 - val_loss: 0.4457 - val_recall: 0.6963
Epoch 8/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4162 - recall: 0.7977 - val_loss: 0.4355 - val_recall: 0.6871
Epoch 9/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4090 - recall: 0.8030 - val_loss: 0.4457 - val_recall: 0.6963
Epoch 10/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4045 - recall: 0.8106 - val_loss: 0.4361 - val_recall: 0.6840
Epoch 11/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4003 - recall: 0.8075 - val_loss: 0.4829 - val_recall: 0.7546
Epoch 12/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3967 - recall: 0.8134 - val_loss: 0.4163 - val_recall: 0.5951
Epoch 13/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3941 - recall: 0.8114 - val_loss: 0.4372 - val_recall: 0.6933
Epoch 14/50
319/319 [==============================] - 1s 5ms/step - loss: 0.3915 - recall: 0.8173 - val_loss: 0.4883 - val_recall: 0.7270
Epoch 15/50
319/319 [==============================] - 2s 5ms/step - loss: 0.3872 - recall: 0.8191 - val_loss: 0.4773 - val_recall: 0.7331
Epoch 16/50
319/319 [==============================] - 2s 5ms/step - loss: 0.3849 - recall: 0.8228 - val_loss: 0.4503 - val_recall: 0.7025
Epoch 17/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3848 - recall: 0.8255 - val_loss: 0.4790 - val_recall: 0.7362
Epoch 18/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3821 - recall: 0.8279 - val_loss: 0.4313 - val_recall: 0.6411
Epoch 19/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3822 - recall: 0.8214 - val_loss: 0.4268 - val_recall: 0.6534
Epoch 20/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3797 - recall: 0.8234 - val_loss: 0.4568 - val_recall: 0.7117
Epoch 21/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3769 - recall: 0.8289 - val_loss: 0.4174 - val_recall: 0.6104
Epoch 22/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3767 - recall: 0.8291 - val_loss: 0.4212 - val_recall: 0.6166
Epoch 23/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3730 - recall: 0.8301 - val_loss: 0.4373 - val_recall: 0.6871
Epoch 24/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3733 - recall: 0.8332 - val_loss: 0.4384 - val_recall: 0.6779
Epoch 25/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3709 - recall: 0.8367 - val_loss: 0.4477 - val_recall: 0.6503
Epoch 26/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3679 - recall: 0.8363 - val_loss: 0.4642 - val_recall: 0.6994
Epoch 27/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3675 - recall: 0.8371 - val_loss: 0.4325 - val_recall: 0.6656
Epoch 28/50
319/319 [==============================] - 1s 4ms/step - loss: 0.3659 - recall: 0.8389 - val_loss: 0.4414 - val_recall: 0.6656
Epoch 29/50
319/319 [==============================] - 1s 5ms/step - loss: 0.3629 - recall: 0.8432 - val_loss: 0.4497 - val_recall: 0.6810
Epoch 30/50
319/319 [==============================] - 1s 5ms/step - loss: 0.3616 - recall: 0.8444 - val_loss: 0.5024 - val_recall: 0.7515
Epoch 31/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3614 - recall: 0.8420 - val_loss: 0.4517 - val_recall: 0.6902
Epoch 32/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3568 - recall: 0.8487 - val_loss: 0.4238 - val_recall: 0.6380
Epoch 33/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3576 - recall: 0.8438 - val_loss: 0.4938 - val_recall: 0.7546
Epoch 34/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3550 - recall: 0.8548 - val_loss: 0.4234 - val_recall: 0.6288
Epoch 35/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3543 - recall: 0.8511 - val_loss: 0.4444 - val_recall: 0.6779
Epoch 36/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3523 - recall: 0.8536 - val_loss: 0.4170 - val_recall: 0.5951
Epoch 37/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3508 - recall: 0.8518 - val_loss: 0.4995 - val_recall: 0.7485
Epoch 38/50
319/319 [==============================] - 1s 2ms/step - loss: 0.3497 - recall: 0.8546 - val_loss: 0.4564 - val_recall: 0.6595
Epoch 39/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3481 - recall: 0.8552 - val_loss: 0.4399 - val_recall: 0.6534
Epoch 40/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3468 - recall: 0.8595 - val_loss: 0.4688 - val_recall: 0.7055
Epoch 41/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3461 - recall: 0.8571 - val_loss: 0.4709 - val_recall: 0.6840
Epoch 42/50
319/319 [==============================] - 1s 4ms/step - loss: 0.3442 - recall: 0.8624 - val_loss: 0.4269 - val_recall: 0.6258
Epoch 43/50
319/319 [==============================] - 1s 4ms/step - loss: 0.3443 - recall: 0.8583 - val_loss: 0.4380 - val_recall: 0.6626
Epoch 44/50
319/319 [==============================] - 2s 5ms/step - loss: 0.3432 - recall: 0.8617 - val_loss: 0.4576 - val_recall: 0.6871
Epoch 45/50
319/319 [==============================] - 1s 4ms/step - loss: 0.3404 - recall: 0.8599 - val_loss: 0.4368 - val_recall: 0.6319
Epoch 46/50
319/319 [==============================] - 1s 2ms/step - loss: 0.3404 - recall: 0.8620 - val_loss: 0.4375 - val_recall: 0.6350
Epoch 47/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3383 - recall: 0.8575 - val_loss: 0.4633 - val_recall: 0.6718
Epoch 48/50
319/319 [==============================] - 1s 3ms/step - loss: 0.3377 - recall: 0.8634 - val_loss: 0.4754 - val_recall: 0.7117
Epoch 49/50
319/319 [==============================] - 1s 2ms/step - loss: 0.3379 - recall: 0.8591 - val_loss: 0.4660 - val_recall: 0.6840
Epoch 50/50
319/319 [==============================] - 1s 2ms/step - loss: 0.3361 - recall: 0.8660 - val_loss: 0.4354 - val_recall: 0.6135

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_4.history['recall'])
plt.plot(history_4.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
y_train_pred = model_4.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
Out[ ]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [False],
       [ True]])
In [ ]:
y_val_pred = model_4.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [False],
       [ True]])
In [ ]:
model_name = "NN with SMOTE & Adam"

train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)

Classification report

In [ ]:
cr=classification_report(y_train_smote,y_train_pred)
print("Classification report on NN with SMOTE & Adam on training set ")
print(cr)
Classification report on NN with SMOTE & Adam on training set 
              precision    recall  f1-score   support

           0       0.85      0.88      0.86      5096
           1       0.87      0.85      0.86      5096

    accuracy                           0.86     10192
   macro avg       0.86      0.86      0.86     10192
weighted avg       0.86      0.86      0.86     10192

In [ ]:
cr=classification_report(y_val,y_val_pred) ## Complete the code to check the model's performance on the validation set
print("Classification report on NN with SMOTE & Adam on validation set ")
print(cr)
Classification report on NN with SMOTE & Adam on validation set 
              precision    recall  f1-score   support

           0       0.90      0.85      0.87      1274
           1       0.51      0.61      0.56       326

    accuracy                           0.80      1600
   macro avg       0.71      0.73      0.72      1600
weighted avg       0.82      0.80      0.81      1600

Confusion matrix

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set

Key Observations and Takeaways for NN with SMOTE & Adam Optimizer

Model Loss

Training Loss: The training loss decreases steadily over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss decreases initially but shows fluctuations. This suggests some level of overfitting, though the overall trend indicates improvement. Model Recall

Training Recall: The training recall improves steadily, indicating that the model is increasingly able to correctly identify positive cases in the training set. Validation Recall: The validation recall also improves but fluctuates more compared to the training recall, suggesting some variability in performance on the validation data. Confusion Matrix (Training Data)

True Negatives (TN): 4470 (43.86%) False Positives (FP): 626 (6.14%) False Negatives (FN): 773 (7.58%) True Positives (TP): 4323 (42.42%) Confusion Matrix (Validation Data)

True Negatives (TN): 1085 (67.81%) False Positives (FP): 189 (11.81%) False Negatives (FN): 126 (7.88%) True Positives (TP): 200 (12.50%) Classification Report (Training Data)

Precision: 0.85 (Class 0), 0.87 (Class 1) Recall: 0.88 (Class 0), 0.85 (Class 1) F1-Score: 0.86 (Class 0), 0.86 (Class 1) Accuracy: 0.86 Classification Report (Validation Data)

Precision: 0.90 (Class 0), 0.51 (Class 1) Recall: 0.85 (Class 0), 0.61 (Class 1) F1-Score: 0.87 (Class 0), 0.56 (Class 1) Accuracy: 0.80 Key Takeaways

Balanced Data Impact: Applying SMOTE has helped balance the classes, leading to improved recall for the positive class (churn) on the training data. This ensures the model is better at identifying churn cases. Overfitting Concerns: The fluctuations in validation loss and recall suggest some overfitting. Despite using Adam optimizer, the model struggles to generalize perfectly to the validation data. Precision and Recall: The precision for the positive class (churn) is moderate, indicating that while the model can identify churn cases, it is not as confident as for the non-churn cases. The recall for the positive class on the validation set is an improvement over previous models but still shows room for growth. Overall Performance: The model shows good overall performance with an accuracy of 0.80 on the validation set. The use of SMOTE and the Adam optimizer has improved the model's ability to detect positive cases compared to previous attempts. In summary, while the model with SMOTE and Adam optimizer shows improved recall for the positive class, there is still variability and room for improvement in the model's performance on the validation set. Further tuning and perhaps additional regularization methods could help enhance stability and performance

Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout¶

In [ ]:
backend.clear_session()
#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [ ]:
# Initializing the model
model_5 = Sequential()
# Adding the input layer with 32 neurons and relu as activation function
model_5.add(Dense(32, activation='relu', input_dim=X_train_smote.shape[1]))
# Adding dropout with a rate of 0.2
model_5.add(Dropout(0.2))
# Adding a hidden layer with 16 neurons and relu as activation function
model_5.add(Dense(16, activation='relu'))
# Adding dropout with a rate of 0.2
model_5.add(Dropout(0.2))
# Adding hidden layer with 8 neurons and relu as activation function
model_5.add(Dense(8, activation='relu'))
# Adding the output layer with 1 neuron and sigmoid as activation function
model_5.add(Dense(1, activation='sigmoid'))
In [ ]:
#Complete the code to use Adam as the optimizer.
optimizer = tf.keras.optimizers.Adam()

# uncomment one of the following lines to define the metric to be used
# metric = 'accuracy'
metric = keras.metrics.Recall()
# metric = keras.metrics.Precision()
# metric = keras.metrics.F1Score()
In [ ]:
# Complete the code to compile the model with binary cross entropy as loss function and recall as the metric
model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [ ]:
model_5.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dropout (Dropout)           (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 16)                528       
                                                                 
 dropout_1 (Dropout)         (None, 16)                0         
                                                                 
 dense_2 (Dense)             (None, 8)                 136       
                                                                 
 dense_3 (Dense)             (None, 1)                 9         
                                                                 
=================================================================
Total params: 1057 (4.13 KB)
Trainable params: 1057 (4.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [ ]:
# Fitting the ANN
history_5 = model_5.fit(
    X_train_smote, y_train_smote,
    batch_size=32, # Specify the batch size to use
    epochs=50,  # Specify the number of epochs
    verbose=1,
    validation_data=(X_val, y_val)
)
Epoch 1/50
319/319 [==============================] - 2s 3ms/step - loss: 0.6299 - recall: 0.6703 - val_loss: 0.5559 - val_recall: 0.6595
Epoch 2/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5768 - recall: 0.7206 - val_loss: 0.5516 - val_recall: 0.6963
Epoch 3/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5579 - recall: 0.7359 - val_loss: 0.5347 - val_recall: 0.6656
Epoch 4/50
319/319 [==============================] - 2s 5ms/step - loss: 0.5512 - recall: 0.7382 - val_loss: 0.5276 - val_recall: 0.6748
Epoch 5/50
319/319 [==============================] - 1s 4ms/step - loss: 0.5407 - recall: 0.7416 - val_loss: 0.5338 - val_recall: 0.6871
Epoch 6/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5278 - recall: 0.7382 - val_loss: 0.5090 - val_recall: 0.6779
Epoch 7/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5158 - recall: 0.7422 - val_loss: 0.5007 - val_recall: 0.6656
Epoch 8/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5064 - recall: 0.7473 - val_loss: 0.4899 - val_recall: 0.6718
Epoch 9/50
319/319 [==============================] - 1s 3ms/step - loss: 0.5012 - recall: 0.7433 - val_loss: 0.5126 - val_recall: 0.6963
Epoch 10/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4952 - recall: 0.7490 - val_loss: 0.4744 - val_recall: 0.6595
Epoch 11/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4888 - recall: 0.7551 - val_loss: 0.4944 - val_recall: 0.7117
Epoch 12/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4814 - recall: 0.7649 - val_loss: 0.4954 - val_recall: 0.7178
Epoch 13/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4735 - recall: 0.7688 - val_loss: 0.4636 - val_recall: 0.6871
Epoch 14/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4730 - recall: 0.7732 - val_loss: 0.4943 - val_recall: 0.7178
Epoch 15/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4640 - recall: 0.7745 - val_loss: 0.4973 - val_recall: 0.7423
Epoch 16/50
319/319 [==============================] - 1s 4ms/step - loss: 0.4596 - recall: 0.7818 - val_loss: 0.4777 - val_recall: 0.7147
Epoch 17/50
319/319 [==============================] - 1s 4ms/step - loss: 0.4555 - recall: 0.7783 - val_loss: 0.4700 - val_recall: 0.7270
Epoch 18/50
319/319 [==============================] - 2s 5ms/step - loss: 0.4517 - recall: 0.7920 - val_loss: 0.4435 - val_recall: 0.7025
Epoch 19/50
319/319 [==============================] - 1s 4ms/step - loss: 0.4489 - recall: 0.7875 - val_loss: 0.4508 - val_recall: 0.7178
Epoch 20/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4461 - recall: 0.7920 - val_loss: 0.4500 - val_recall: 0.7025
Epoch 21/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4425 - recall: 0.7918 - val_loss: 0.4570 - val_recall: 0.7362
Epoch 22/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4413 - recall: 0.7967 - val_loss: 0.4402 - val_recall: 0.6994
Epoch 23/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4348 - recall: 0.7961 - val_loss: 0.4665 - val_recall: 0.7515
Epoch 24/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4347 - recall: 0.8010 - val_loss: 0.4572 - val_recall: 0.7331
Epoch 25/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4314 - recall: 0.8000 - val_loss: 0.4512 - val_recall: 0.7209
Epoch 26/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4292 - recall: 0.8083 - val_loss: 0.4429 - val_recall: 0.7178
Epoch 27/50
319/319 [==============================] - 1s 5ms/step - loss: 0.4320 - recall: 0.8012 - val_loss: 0.4402 - val_recall: 0.7178
Epoch 28/50
319/319 [==============================] - 2s 5ms/step - loss: 0.4207 - recall: 0.8061 - val_loss: 0.4518 - val_recall: 0.7270
Epoch 29/50
319/319 [==============================] - 2s 6ms/step - loss: 0.4299 - recall: 0.8002 - val_loss: 0.4591 - val_recall: 0.7454
Epoch 30/50
319/319 [==============================] - 1s 5ms/step - loss: 0.4251 - recall: 0.7996 - val_loss: 0.4568 - val_recall: 0.7423
Epoch 31/50
319/319 [==============================] - 2s 5ms/step - loss: 0.4229 - recall: 0.8012 - val_loss: 0.4740 - val_recall: 0.7577
Epoch 32/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4222 - recall: 0.8142 - val_loss: 0.4399 - val_recall: 0.7239
Epoch 33/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4196 - recall: 0.8026 - val_loss: 0.4578 - val_recall: 0.7577
Epoch 34/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4195 - recall: 0.8104 - val_loss: 0.4332 - val_recall: 0.7301
Epoch 35/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4169 - recall: 0.8106 - val_loss: 0.4494 - val_recall: 0.7362
Epoch 36/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4159 - recall: 0.8124 - val_loss: 0.4263 - val_recall: 0.6994
Epoch 37/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4180 - recall: 0.8065 - val_loss: 0.4497 - val_recall: 0.7423
Epoch 38/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4101 - recall: 0.8161 - val_loss: 0.4534 - val_recall: 0.7485
Epoch 39/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4118 - recall: 0.8191 - val_loss: 0.4498 - val_recall: 0.7485
Epoch 40/50
319/319 [==============================] - 1s 2ms/step - loss: 0.4134 - recall: 0.8173 - val_loss: 0.4558 - val_recall: 0.7515
Epoch 41/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4206 - recall: 0.8118 - val_loss: 0.4355 - val_recall: 0.7086
Epoch 42/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4136 - recall: 0.8165 - val_loss: 0.4463 - val_recall: 0.7423
Epoch 43/50
319/319 [==============================] - 2s 5ms/step - loss: 0.4085 - recall: 0.8146 - val_loss: 0.4440 - val_recall: 0.7577
Epoch 44/50
319/319 [==============================] - 1s 5ms/step - loss: 0.4123 - recall: 0.8134 - val_loss: 0.4547 - val_recall: 0.7423
Epoch 45/50
319/319 [==============================] - 1s 5ms/step - loss: 0.4092 - recall: 0.8216 - val_loss: 0.4396 - val_recall: 0.7393
Epoch 46/50
319/319 [==============================] - 1s 4ms/step - loss: 0.4095 - recall: 0.8238 - val_loss: 0.4428 - val_recall: 0.7546
Epoch 47/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4056 - recall: 0.8208 - val_loss: 0.4420 - val_recall: 0.7546
Epoch 48/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4114 - recall: 0.8195 - val_loss: 0.4215 - val_recall: 0.7086
Epoch 49/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4089 - recall: 0.8226 - val_loss: 0.4358 - val_recall: 0.7239
Epoch 50/50
319/319 [==============================] - 1s 3ms/step - loss: 0.4108 - recall: 0.8189 - val_loss: 0.4379 - val_recall: 0.7423

Loss function

In [ ]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
#Plotting Train recall vs Validation recall
plt.plot(history_5.history['recall'])
plt.plot(history_5.history['val_recall'])
plt.title('model recall')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
In [ ]:
y_train_pred = model_5.predict(X_train_smote)
#Predicting the results using 0.5 as the threshold
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 3ms/step
Out[ ]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [ ]:
y_val_pred = model_5.predict(X_val)
#Predicting the results using 0.5 as the threshold
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 2ms/step
Out[ ]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [ ]:
model_name = "NN with SMOTE,Adam & Dropout"

train_metric_df.loc[model_name] = recall_score(y_train_smote,y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val,y_val_pred)

Classification report

In [ ]:
cr=classification_report(y_train_smote,y_train_pred)
print("Classification Report of NN with SMOTE, Adam and Dropout on training set")
print(cr)
Classification Report of NN with SMOTE, Adam and Dropout on training set
              precision    recall  f1-score   support

           0       0.85      0.83      0.84      5096
           1       0.83      0.85      0.84      5096

    accuracy                           0.84     10192
   macro avg       0.84      0.84      0.84     10192
weighted avg       0.84      0.84      0.84     10192

In [ ]:
#classification report
cr=classification_report(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set
print("Classification Report of NN with SMOTE, Adam and Dropout on validation set")
print(cr)
Classification Report of NN with SMOTE, Adam and Dropout on validation set
              precision    recall  f1-score   support

           0       0.93      0.82      0.87      1274
           1       0.51      0.74      0.61       326

    accuracy                           0.80      1600
   macro avg       0.72      0.78      0.74      1600
weighted avg       0.84      0.80      0.82      1600

Confusion matrix

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_train_smote, y_train_pred)
In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_val,y_val_pred)  ## Complete the code to check the model's performance on the validation set

Key Observations and Takeaways for NN with SMOTE, Adam Optimizer, and Dropout

Model Loss

Training Loss: The training loss steadily decreases over the epochs, indicating that the model is learning and improving its performance on the training data. Validation Loss: The validation loss also decreases but with more fluctuations compared to the training loss, suggesting some variability in performance on the validation set. Model Recall

Training Recall: The training recall improves steadily, indicating the model's increasing ability to correctly identify positive cases in the training set. Validation Recall: The validation recall shows improvement but fluctuates significantly, indicating variability in the model's performance on the validation data. Confusion Matrix (Training Data)

True Negatives (TN): 4220 (41.41%) False Positives (FP): 876 (8.59%) False Negatives (FN): 758 (7.44%) True Positives (TP): 4338 (42.56%) Confusion Matrix (Validation Data)

True Negatives (TN): 1043 (65.19%) False Positives (FP): 231 (14.44%) False Negatives (FN): 84 (5.25%) True Positives (TP): 242 (15.12%) Classification Report (Training Data)

Precision: 0.85 (Class 0), 0.83 (Class 1) Recall: 0.83 (Class 0), 0.85 (Class 1) F1-Score: 0.84 (Class 0), 0.84 (Class 1) Accuracy: 0.84 Classification Report (Validation Data)

Precision: 0.93 (Class 0), 0.51 (Class 1) Recall: 0.82 (Class 0), 0.74 (Class 1) F1-Score: 0.87 (Class 0), 0.61 (Class 1) Accuracy: 0.80 Key Takeaways

Improvement Over Previous Models: The inclusion of SMOTE, Adam Optimizer, and Dropout layers has generally improved the model's ability to generalize, as seen by the relatively stable validation loss and improved recall on the validation set compared to models without these techniques. Recall: The recall for the positive class (churn) is significantly improved compared to earlier models, indicating that the model is better at identifying positive cases, though there is still room for improvement. Precision: The precision for the positive class is lower than the negative class, suggesting a higher number of false positives, which could lead to unnecessary retention efforts. Overall Performance: The model shows good overall performance with an accuracy of 0.80 on the validation set. The inclusion of dropout has helped to control overfitting, as indicated by the training and validation loss curves. Variability in Validation Metrics: The fluctuations in validation recall and loss suggest that the model's performance on unseen data can vary, indicating potential areas for further model tuning and improvement. By incorporating SMOTE, the Adam optimizer, and dropout, the model has become more robust to overfitting and has shown improved recall for the positive class. However, further efforts are needed to balance precision and recall, particularly for the positive class.

Model Performance Comparison and Final Model Selection¶

In [ ]:
print("Training performance comparison")
train_metric_df
Training performance comparison
Out[ ]:
recall
NN with SGD 0.128834
NN with Adam 0.597393
NN with Adam & Dropout 0.555215
NN with SMOTE & SGD 0.734694
NN with SMOTE & Adam 0.848312
NN with SMOTE,Adam & Dropout 0.851256
In [ ]:
print("Validation set performance comparison")
valid_metric_df
Validation set performance comparison
Out[ ]:
recall
NN with SGD 0.092025
NN with Adam 0.457055
NN with Adam & Dropout 0.481595
NN with SMOTE & SGD 0.677914
NN with SMOTE & Adam 0.613497
NN with SMOTE,Adam & Dropout 0.742331
In [ ]:
train_metric_df - valid_metric_df
Out[ ]:
recall
NN with SGD 0.036810
NN with Adam 0.140337
NN with Adam & Dropout 0.073620
NN with SMOTE & SGD 0.056780
NN with SMOTE & Adam 0.234815
NN with SMOTE,Adam & Dropout 0.108925

Based on the results provided, the best model can be determined by looking at the recall on both the training and validation sets, as well as the difference between training and validation recalls to assess overfitting.

Here are the key points from the data:

Training Set Recall: Higher values indicate better performance on the training set. Validation Set Recall: Higher values indicate better generalization to unseen data. Difference between Training and Validation Recall: Smaller differences indicate less overfitting. Training Set Recall NN with SGD: 0.128834 NN with Adam: 0.597393 NN with Adam & Dropout: 0.555215 NN with SMOTE & SGD: 0.734694 NN with SMOTE & Adam: 0.848312 NN with SMOTE, Adam & Dropout: 0.851256 Validation Set Recall NN with SGD: 0.092025 NN with Adam: 0.457055 NN with Adam & Dropout: 0.481595 NN with SMOTE & SGD: 0.677914 NN with SMOTE & Adam: 0.613497 NN with SMOTE, Adam & Dropout: 0.742331 Difference between Training and Validation Recall NN with SGD: 0.036810 NN with Adam: 0.140337 NN with Adam & Dropout: 0.073620 NN with SMOTE & SGD: 0.056780 NN with SMOTE & Adam: 0.234815 NN with SMOTE, Adam & Dropout: 0.108925 Analysis Highest Validation Recall:

The highest recall on the validation set is from the NN with SMOTE, Adam & Dropout model (0.742331). Balanced Performance:

While NN with SMOTE, Adam & Dropout has a high recall on both training and validation sets, its difference between training and validation recall (0.108925) is acceptable, indicating reasonable generalization without excessive overfitting. Low Overfitting:

NN with SMOTE & SGD also shows good performance with a validation recall of 0.677914 and a small difference between training and validation recalls (0.056780). Considering the combination of high recall on the validation set and a reasonably small difference between training and validation recall, the NN with SMOTE, Adam & Dropout model appears to be the best overall.

Conclusion NN with SMOTE, Adam & Dropout is the best model based on these results due to its highest validation recall and balanced performance between training and validation sets.

In [ ]:
y_test_pred = model_5.predict(X_test)    ## Complete the code to specify the best model
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 1ms/step
[[False]
 [False]
 [False]
 ...
 [ True]
 [False]
 [False]]
In [ ]:
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
              precision    recall  f1-score   support

           0       0.93      0.81      0.86      1593
           1       0.50      0.74      0.60       407

    accuracy                           0.79      2000
   macro avg       0.71      0.78      0.73      2000
weighted avg       0.84      0.79      0.81      2000

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)

Observations and Key Takeaways for Model 5 NN with SMOTE, Adam & Dropout on Test Data

Observations Confusion Matrix:

True Negatives (TN): 1286 (64.30%) False Positives (FP): 307 (15.35%) False Negatives (FN): 104 (5.20%) True Positives (TP): 303 (15.15%) Classification Report:

Class 0 (Not Churn): Precision: 0.93 Recall: 0.81 F1-Score: 0.86 Support: 1593 Class 1 (Churn): Precision: 0.50 Recall: 0.74 F1-Score: 0.60 Support: 407 Overall Performance:

Accuracy: 0.79 Macro Average: Precision: 0.71 Recall: 0.78 F1-Score: 0.73 Weighted Average: Precision: 0.84 Recall: 0.79 F1-Score: 0.81 Key Takeaways Precision and Recall:

The precision for class 0 is high (0.93), indicating that most of the predicted not churn instances are correct. The recall for class 1 is relatively high (0.74), meaning the model is effective in identifying a significant number of actual churn instances. Balanced Performance:

The model shows a balanced performance between precision and recall for both classes, with a slight bias towards not missing churn cases, as indicated by the higher recall for class 1. However, the precision for class 1 (0.50) is lower, indicating a higher number of false positives for churn predictions. F1-Score:

The F1-Score for class 1 is 0.60, which suggests that there is still room for improvement in terms of balancing precision and recall for churn predictions. Overall Accuracy:

An overall accuracy of 0.79 on the test set demonstrates the model's effectiveness in predicting both churn and not churn instances, but with a noticeable number of misclassifications. Macro vs. Weighted Averages:

The macro average recall (0.78) being slightly higher than the accuracy (0.79) indicates that the model is fairly consistent across both classes. The weighted averages show a strong overall performance, especially considering the higher number of not churn cases. Conclusion The NN with SMOTE, Adam & Dropout model demonstrates solid performance on the test data, particularly with a high recall for churn cases, which is crucial for minimizing missed churn predictions. Despite some false positives, the model maintains a good balance and can be considered effective for practical application in churn prediction. Further tuning might be needed to enhance precision for churn predictions while maintaining or improving recall.

Actionable Insights and Business Recommendations¶

Actionable Insights and Business Recommendations

Actionable Insights High Recall for Churn Detection:

The model has a high recall for detecting churn cases, meaning it successfully identifies most of the customers who are likely to churn. This is crucial for proactive retention strategies. Precision Trade-off:

While the recall for churn is high, the precision is moderate. This indicates that while many churners are correctly identified, there are also a number of false positives. This means some customers predicted to churn may not actually do so. Overall Model Performance:

The model’s overall accuracy and balanced performance suggest it is reliable for churn prediction, but there is still room for improvement, particularly in reducing false positives. Effective Handling of Imbalanced Data:

Using SMOTE to balance the training data has proven effective, resulting in a model that performs well across both churn and non-churn classes. Business Recommendations Proactive Retention Strategies:

Utilize the model to identify at-risk customers and target them with retention strategies such as personalized offers, loyalty programs, or enhanced customer service. Focus on those predicted to churn to prevent actual churn. Resource Allocation:

Given the moderate precision, consider a tiered approach to resource allocation. High-risk churners (high model confidence) should receive immediate and significant attention, while lower-risk cases (lower model confidence) can receive less intensive interventions. Further Model Tuning:

Continue to refine the model to improve precision while maintaining high recall. This could involve exploring additional features, fine-tuning hyperparameters, or employing different balancing techniques. Customer Feedback Loop:

Implement a feedback loop where the outcomes of retention efforts are tracked and fed back into the model. This can help in continually improving the model’s accuracy and effectiveness. Segment Analysis:

Perform a deeper analysis of different customer segments. Understanding which segments are more likely to churn and why can help tailor specific retention strategies to different groups. Communication Strategy:

Develop a communication strategy to address false positives. For customers who are incorrectly predicted to churn, ensure that retention efforts do not appear unnecessary or intrusive. Monitoring and Reporting:

Regularly monitor the model’s performance and impact on churn rates. Set up dashboards and reports to track key metrics, including the number of churners detected, retention rates, and overall model accuracy. A/B Testing:

Implement A/B testing for different retention strategies based on model predictions. This can help identify the most effective approaches and further refine intervention tactics. By implementing these actionable insights and business recommendations, the company can leverage the predictive model to effectively reduce customer churn, enhance customer satisfaction, and ultimately improve overall business performance.

Power Ahead


In [ ]:
y_test_pred = model_0.predict(X_test)    ## Complete the code to specify the best model
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 1ms/step
[[False]
 [False]
 [False]
 ...
 [False]
 [False]
 [ True]]
In [ ]:
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
              precision    recall  f1-score   support

           0       0.79      0.77      0.78      1593
           1       0.18      0.19      0.19       407

    accuracy                           0.65      2000
   macro avg       0.48      0.48      0.48      2000
weighted avg       0.66      0.65      0.66      2000

In [ ]:
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)

Appendix¶

Non Linear relation ship¶

Scatter plot¶

In [ ]:
import seaborn as sns
import matplotlib.pyplot as plt

# Define the features to analyze
features = ['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']

# Create scatter plots with trend lines
for feature in features:
    plt.figure(figsize=(10, 6))
    sns.regplot(x=ds[feature], y=ds['Exited'], scatter_kws={'alpha':0.5}, line_kws={"color":"red"})
    plt.title(f'Scatter plot of {feature} vs Exited with Trend Line')
    plt.xlabel(feature)
    plt.ylabel('Exited')
    plt.show()

Polynomial Features Created for Pair Plots¶

In [ ]:
from sklearn.preprocessing import PolynomialFeatures

# Select features for polynomial transformation
X = ds[['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']]
y = ds['Exited']

# Create polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)

# Display polynomial feature names
poly.get_feature_names_out(input_features=X.columns)
Out[ ]:
array(['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary',
       'CreditScore^2', 'CreditScore Age', 'CreditScore Tenure',
       'CreditScore Balance', 'CreditScore EstimatedSalary', 'Age^2',
       'Age Tenure', 'Age Balance', 'Age EstimatedSalary', 'Tenure^2',
       'Tenure Balance', 'Tenure EstimatedSalary', 'Balance^2',
       'Balance EstimatedSalary', 'EstimatedSalary^2'], dtype=object)
In [ ]:
sns.pairplot(ds[['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary', 'Exited']], hue='Exited')
plt.show()

Observations¶

No Clear Pattern

In [ ]:
import pandas as pd
from sklearn.metrics import recall_score

# Initialize empty dataframes to store recall values
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])
In [ ]:
def train_and_evaluate_model(model_name, model, X_train, y_train, X_val, y_val):
    # Train the model
    model.fit(X_train, y_train, epochs=50, batch_size=32, verbose=1, validation_data=(X_val, y_val))

    # Predict on training set
    y_train_pred = model.predict(X_train)
    y_train_pred = (y_train_pred > 0.5).astype(int)
    train_recall = recall_score(y_train, y_train_pred)

    # Predict on validation set
    y_val_pred = model.predict(X_val)
    y_val_pred = (y_val_pred > 0.5).astype(int)
    val_recall = recall_score(y_val, y_val_pred)

    # Update the dataframes
    train_metric_df.loc[model_name] = train_recall
    valid_metric_df.loc[model_name] = val_recall
In [ ]:
# Example models (assuming these are defined correctly elsewhere in your code)
model_sgd = ...
model_adam = ...
model_adam_dropout = ...
model_smote_sgd = ...
model_smote_adam = ...
model_smote_adam_dropout = ...

# Train and evaluate each model
train_and_evaluate_model("NN with SGD", model_sgd, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with Adam", model_adam, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with Adam & Dropout", model_adam_dropout, X_train, y_train, X_val, y_val)
train_and_evaluate_model("NN with SMOTE & SGD", model_smote_sgd, X_train_smote, y_train_smote, X_val, y_val)
train_and_evaluate_model("NN with SMOTE & Adam", model_smote_adam, X_train_smote, y_train_smote, X_val, y_val)
train_and_evaluate_model("NN with SMOTE, Adam & Dropout", model_smote_adam_dropout, X_train_smote, y_train_smote, X_val, y_val)
In [ ]:
print("Training performance comparison")
print(train_metric_df)

print("Validation set performance comparison")
print(valid_metric_df)